This notebook is for preparing all datasets for integration. 

This involves:
* reading in each dataset
* check metadata all correct
* add additional metadata regarding site and cancer_subtype
* add metadata for sample_type_major
* add metadata for integration_id --> samples that are not biologically distinct (eg. two biopsies from one tumour) get same id
* use integration id to merge layers --> layers in dataset will represent how they will be integrated 
* exclude any samples with <100 myeloid cells
* record number of cells

Backing up to rdm: 
``` bash
rsync -azvhp /scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/ /QRISdata/Q5935/nikita/scdata/Myeloid_Cells/Myeloid_Cells_Integrate
```

In [1]:
#set wd
getwd()
setwd('/scratch/user/s4436039/scdata/Myeloid_Cells')
getwd()

In [2]:
#Load packages
library(dplyr)
library(Seurat)
library(patchwork)


Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


Loading required package: SeuratObject

Loading required package: sp


Attaching package: ‘SeuratObject’


The following object is masked from ‘package:base’:

    intersect




## GSE184880

In [41]:
HGSOC <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE184880_myeloid.RDS")

In [42]:
HGSOC
HGSOC@project.name
head(HGSOC@meta.data)

An object of class Seurat 
27984 features across 7799 samples within 1 assay 
Active assay: RNA (27984 features, 2000 variable features)
 25 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE184880_Cancer1_AAACCCACAGCTGCCA-1,GSE184880,9374,2655,tumour,HGSOC,Cancer1,GSE184880_HGSOC_Cancer1,15.980371,1,1
GSE184880_Cancer1_AAACCCACATGACGGA-1,GSE184880,2659,1246,tumour,HGSOC,Cancer1,GSE184880_HGSOC_Cancer1,8.837909,1,1
GSE184880_Cancer1_AAACGAACAGTAGTGG-1,GSE184880,3020,1206,tumour,HGSOC,Cancer1,GSE184880_HGSOC_Cancer1,13.807947,1,1
GSE184880_Cancer1_AAACGAATCACCCTCA-1,GSE184880,50940,6660,tumour,HGSOC,Cancer1,GSE184880_HGSOC_Cancer1,8.531606,1,1
GSE184880_Cancer1_AAACGCTTCTCCACTG-1,GSE184880,10129,2880,tumour,HGSOC,Cancer1,GSE184880_HGSOC_Cancer1,11.225195,1,1
GSE184880_Cancer1_AAACGCTTCTGCTCTG-1,GSE184880,12756,3352,tumour,HGSOC,Cancer1,GSE184880_HGSOC_Cancer1,9.321104,1,1


In [43]:
table(HGSOC$sample_type)
table(HGSOC$cancer_type)
table(HGSOC$patient_id)
table(HGSOC$sample_id)


Healthy_ovary        tumour 
         1457          6342 


Healthy   HGSOC 
   1457    6342 


Cancer1 Cancer2 Cancer3 Cancer4 Cancer5 Cancer6 Cancer7   Norm1   Norm2   Norm3 
   2298    1080     577     792     695     652     248      54     281     360 
  Norm4   Norm5 
    193     569 


GSE184880_Healthy_Norm1 GSE184880_Healthy_Norm2 GSE184880_Healthy_Norm3 
                     54                     281                     360 
GSE184880_Healthy_Norm4 GSE184880_Healthy_Norm5 GSE184880_HGSOC_Cancer1 
                    193                     569                    2298 
GSE184880_HGSOC_Cancer2 GSE184880_HGSOC_Cancer3 GSE184880_HGSOC_Cancer4 
                   1080                     577                     792 
GSE184880_HGSOC_Cancer5 GSE184880_HGSOC_Cancer6 GSE184880_HGSOC_Cancer7 
                    695                     652                     248 

In [44]:
#set site metadata
HGSOC@meta.data$site <- "ovary"

In [45]:
#set subtype metadata

#split by cancer_type
HGSOC_tumour <- subset(HGSOC, subset = cancer_type %in% c("HGSOC"))
HGSOC_healthy <- subset(HGSOC, subset = cancer_type %in% c("Healthy"))

HGSOC_tumour@meta.data$cancer_subtype <- "HGSOC"
HGSOC_healthy@meta.data$cancer_subtype <- "NA"

HGSOC_tumour@meta.data$sample_type_major <- "primary tumour"
HGSOC_healthy@meta.data$sample_type_major <- "healthy"

#Merge seurat objects back together
HGSOC <- merge(HGSOC_tumour, y = c(HGSOC_healthy), project = "GSE184880")

In [46]:
#set integration_id metadata
HGSOC@meta.data$integration_id <- HGSOC@meta.data$sample_id

In [47]:
HGSOC
HGSOC@project.name
head(HGSOC@meta.data)

An object of class Seurat 
27984 features across 7799 samples within 1 assay 
Active assay: RNA (27984 features, 2000 variable features)
 26 layers present: counts.1.1, counts.10.2, counts.11.2, counts.12.2, counts.2.1, counts.3.1, counts.4.1, counts.5.1, counts.6.1, counts.7.1, data.1.1, data.2.1, data.3.1, data.4.1, data.5.1, data.6.1, data.7.1, scale.data.1, counts.8.2, counts.9.2, data.8.2, data.9.2, data.10.2, data.11.2, data.12.2, scale.data.2

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,cancer_subtype,sample_type_major,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE184880_Cancer1_AAACCCACAGCTGCCA-1,GSE184880,9374,2655,tumour,HGSOC,Cancer1,GSE184880_HGSOC_Cancer1,15.980371,1,1,ovary,HGSOC,primary tumour,GSE184880_HGSOC_Cancer1
GSE184880_Cancer1_AAACCCACATGACGGA-1,GSE184880,2659,1246,tumour,HGSOC,Cancer1,GSE184880_HGSOC_Cancer1,8.837909,1,1,ovary,HGSOC,primary tumour,GSE184880_HGSOC_Cancer1
GSE184880_Cancer1_AAACGAACAGTAGTGG-1,GSE184880,3020,1206,tumour,HGSOC,Cancer1,GSE184880_HGSOC_Cancer1,13.807947,1,1,ovary,HGSOC,primary tumour,GSE184880_HGSOC_Cancer1
GSE184880_Cancer1_AAACGAATCACCCTCA-1,GSE184880,50940,6660,tumour,HGSOC,Cancer1,GSE184880_HGSOC_Cancer1,8.531606,1,1,ovary,HGSOC,primary tumour,GSE184880_HGSOC_Cancer1
GSE184880_Cancer1_AAACGCTTCTCCACTG-1,GSE184880,10129,2880,tumour,HGSOC,Cancer1,GSE184880_HGSOC_Cancer1,11.225195,1,1,ovary,HGSOC,primary tumour,GSE184880_HGSOC_Cancer1
GSE184880_Cancer1_AAACGCTTCTGCTCTG-1,GSE184880,12756,3352,tumour,HGSOC,Cancer1,GSE184880_HGSOC_Cancer1,9.321104,1,1,ovary,HGSOC,primary tumour,GSE184880_HGSOC_Cancer1


In [48]:
#exclude any samples with <100 cells
table(HGSOC$integration_id)
#exclude Norm1
HGSOC <- subset(HGSOC, !(subset = integration_id %in% c("GSE184880_Healthy_Norm1")))
table(HGSOC$integration_id)


GSE184880_Healthy_Norm1 GSE184880_Healthy_Norm2 GSE184880_Healthy_Norm3 
                     54                     281                     360 
GSE184880_Healthy_Norm4 GSE184880_Healthy_Norm5 GSE184880_HGSOC_Cancer1 
                    193                     569                    2298 
GSE184880_HGSOC_Cancer2 GSE184880_HGSOC_Cancer3 GSE184880_HGSOC_Cancer4 
                   1080                     577                     792 
GSE184880_HGSOC_Cancer5 GSE184880_HGSOC_Cancer6 GSE184880_HGSOC_Cancer7 
                    695                     652                     248 


GSE184880_Healthy_Norm2 GSE184880_Healthy_Norm3 GSE184880_Healthy_Norm4 
                    281                     360                     193 
GSE184880_Healthy_Norm5 GSE184880_HGSOC_Cancer1 GSE184880_HGSOC_Cancer2 
                    569                    2298                    1080 
GSE184880_HGSOC_Cancer3 GSE184880_HGSOC_Cancer4 GSE184880_HGSOC_Cancer5 
                    577                     792                     695 
GSE184880_HGSOC_Cancer6 GSE184880_HGSOC_Cancer7 
                    652                     248 

In [49]:
#join layers and then split them by integration_id
Layers(HGSOC[["RNA"]])
#join layers
HGSOC[["RNA"]] <- JoinLayers(HGSOC[["RNA"]])
Layers(HGSOC[["RNA"]])
#split layers
HGSOC[["RNA"]] <- split(HGSOC[["RNA"]], f = HGSOC$integration_id)
Layers(HGSOC[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [50]:
#record number of cells
table(HGSOC$integration_id)


GSE184880_Healthy_Norm2 GSE184880_Healthy_Norm3 GSE184880_Healthy_Norm4 
                    281                     360                     193 
GSE184880_Healthy_Norm5 GSE184880_HGSOC_Cancer1 GSE184880_HGSOC_Cancer2 
                    569                    2298                    1080 
GSE184880_HGSOC_Cancer3 GSE184880_HGSOC_Cancer4 GSE184880_HGSOC_Cancer5 
                    577                     792                     695 
GSE184880_HGSOC_Cancer6 GSE184880_HGSOC_Cancer7 
                    652                     248 

In [51]:
#re-export seurat object ready for integration
saveRDS(HGSOC, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE184880_myeloid_int.RDS")

In [52]:
#remove all objects in R
rm(list = ls())

## GSE213243

In [53]:
HGSOC_tu <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE213243_Tumour_myeloid.RDS")
HGSOC_As <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE213243_Ascites_myeloid.RDS")

In [54]:
HGSOC_tu
HGSOC_tu@project.name
head(HGSOC_tu@meta.data)

HGSOC_As
HGSOC_As@project.name
head(HGSOC_As@meta.data)

An object of class Seurat 
58825 features across 804 samples within 1 assay 
Active assay: RNA (58825 features, 2000 variable features)
 3 layers present: counts, data, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.5,seurat_clusters
Unnamed: 0_level_1,<fct>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE213243_tumour_AAAGGTACACGCAGTC-1,GSE213243,8050,2780,tumour,HGSOC,pt-1,GSE213243_HGSOC_tumour,19.962733,3,3
GSE213243_tumour_AAATGGACACACGCCA-1,GSE213243,5854,2467,tumour,HGSOC,pt-1,GSE213243_HGSOC_tumour,4.936795,3,3
GSE213243_tumour_AACAAAGCAATTTCCT-1,GSE213243,6073,2541,tumour,HGSOC,pt-1,GSE213243_HGSOC_tumour,6.323069,3,3
GSE213243_tumour_AACACACGTAGCTTTG-1,GSE213243,13497,3862,tumour,HGSOC,pt-1,GSE213243_HGSOC_tumour,5.319701,3,3
GSE213243_tumour_AACACACTCGCTGTTC-1,GSE213243,8644,3306,tumour,HGSOC,pt-1,GSE213243_HGSOC_tumour,10.596946,3,3
GSE213243_tumour_AACAGGGCAACCCTAA-1,GSE213243,6263,2562,tumour,HGSOC,pt-1,GSE213243_HGSOC_tumour,3.544627,3,3


An object of class Seurat 
58825 features across 2688 samples within 1 assay 
Active assay: RNA (58825 features, 2000 variable features)
 3 layers present: counts, data, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.5,seurat_clusters
Unnamed: 0_level_1,<fct>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE213243_ascites_AAACCCAAGTAGCAAT-2,GSE213243,16943,4684,ascites,HGSOC,pt-1,GSE213243_HGSOC_ascites,7.572449,5,5
GSE213243_ascites_AAACCCACAGTCGTTA-2,GSE213243,14219,3822,ascites,HGSOC,pt-1,GSE213243_HGSOC_ascites,5.02145,1,1
GSE213243_ascites_AAACCCATCCGTAGTA-2,GSE213243,15634,4224,ascites,HGSOC,pt-1,GSE213243_HGSOC_ascites,6.556224,5,5
GSE213243_ascites_AAACGAAAGTGCTCGC-2,GSE213243,3007,1377,ascites,HGSOC,pt-1,GSE213243_HGSOC_ascites,28.766212,6,6
GSE213243_ascites_AAACGAAGTATGGTAA-2,GSE213243,13828,4227,ascites,HGSOC,pt-1,GSE213243_HGSOC_ascites,4.122071,5,5
GSE213243_ascites_AAACGCTAGTATCTGC-2,GSE213243,12945,3944,ascites,HGSOC,pt-1,GSE213243_HGSOC_ascites,8.937814,6,6


In [55]:
table(HGSOC_tu$sample_type)
table(HGSOC_tu$cancer_type)
table(HGSOC_tu$patient_id)
table(HGSOC_tu$sample_id)

table(HGSOC_As$sample_type)
table(HGSOC_As$cancer_type)
table(HGSOC_As$patient_id)
table(HGSOC_As$sample_id)


tumour 
   804 


HGSOC 
  804 


pt-1 
 804 


GSE213243_HGSOC_tumour 
                   804 


ascites 
   2688 


HGSOC 
 2688 


pt-1 
2688 


GSE213243_HGSOC_ascites 
                   2688 

In [56]:
#set site metadata
HGSOC_tu@meta.data$site <- "ovary"
HGSOC_As@meta.data$site <- "ascites fluid"

HGSOC_tu@meta.data$sample_type_major <- "primary tumour"
HGSOC_As@meta.data$sample_type_major <- "ascites"

In [57]:
#set subtype metadata

#split by cancer_type
HGSOC_tu@meta.data$cancer_subtype <- "HGSOC"
HGSOC_As@meta.data$cancer_subtype <- "HGSOC"

In [58]:
#merge objects
HGSOC <- merge(HGSOC_tu, y = c(HGSOC_As), project = "GSE213243")

In [59]:
#set integration_id metadata
HGSOC@meta.data$integration_id <- HGSOC@meta.data$sample_id

In [60]:
HGSOC
HGSOC@project.name
head(HGSOC@meta.data)
tail(HGSOC@meta.data)

An object of class Seurat 
58825 features across 3492 samples within 1 assay 
Active assay: RNA (58825 features, 2000 variable features)
 6 layers present: counts.1, counts.2, data.1, scale.data.1, data.2, scale.data.2

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.5,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE213243_tumour_AAAGGTACACGCAGTC-1,GSE213243,8050,2780,tumour,HGSOC,pt-1,GSE213243_HGSOC_tumour,19.962733,3,3,ovary,primary tumour,HGSOC,GSE213243_HGSOC_tumour
GSE213243_tumour_AAATGGACACACGCCA-1,GSE213243,5854,2467,tumour,HGSOC,pt-1,GSE213243_HGSOC_tumour,4.936795,3,3,ovary,primary tumour,HGSOC,GSE213243_HGSOC_tumour
GSE213243_tumour_AACAAAGCAATTTCCT-1,GSE213243,6073,2541,tumour,HGSOC,pt-1,GSE213243_HGSOC_tumour,6.323069,3,3,ovary,primary tumour,HGSOC,GSE213243_HGSOC_tumour
GSE213243_tumour_AACACACGTAGCTTTG-1,GSE213243,13497,3862,tumour,HGSOC,pt-1,GSE213243_HGSOC_tumour,5.319701,3,3,ovary,primary tumour,HGSOC,GSE213243_HGSOC_tumour
GSE213243_tumour_AACACACTCGCTGTTC-1,GSE213243,8644,3306,tumour,HGSOC,pt-1,GSE213243_HGSOC_tumour,10.596946,3,3,ovary,primary tumour,HGSOC,GSE213243_HGSOC_tumour
GSE213243_tumour_AACAGGGCAACCCTAA-1,GSE213243,6263,2562,tumour,HGSOC,pt-1,GSE213243_HGSOC_tumour,3.544627,3,3,ovary,primary tumour,HGSOC,GSE213243_HGSOC_tumour


Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.5,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE213243_ascites_TTTGATCGTTAGGCCC-2,GSE213243,20342,4702,ascites,HGSOC,pt-1,GSE213243_HGSOC_ascites,5.899125,5,5,ascites fluid,ascites,HGSOC,GSE213243_HGSOC_ascites
GSE213243_ascites_TTTGATCTCTCGGCTT-2,GSE213243,1614,820,ascites,HGSOC,pt-1,GSE213243_HGSOC_ascites,34.262701,1,1,ascites fluid,ascites,HGSOC,GSE213243_HGSOC_ascites
GSE213243_ascites_TTTGGAGCACGTCTCT-2,GSE213243,10549,3639,ascites,HGSOC,pt-1,GSE213243_HGSOC_ascites,11.119537,6,6,ascites fluid,ascites,HGSOC,GSE213243_HGSOC_ascites
GSE213243_ascites_TTTGGAGGTCCTGGGT-2,GSE213243,4613,2061,ascites,HGSOC,pt-1,GSE213243_HGSOC_ascites,12.421418,1,1,ascites fluid,ascites,HGSOC,GSE213243_HGSOC_ascites
GSE213243_ascites_TTTGGTTCATCCTATT-2,GSE213243,6073,2678,ascites,HGSOC,pt-1,GSE213243_HGSOC_ascites,11.954553,1,1,ascites fluid,ascites,HGSOC,GSE213243_HGSOC_ascites
GSE213243_ascites_TTTGTTGCATGATGCT-2,GSE213243,14293,4430,ascites,HGSOC,pt-1,GSE213243_HGSOC_ascites,5.044427,6,6,ascites fluid,ascites,HGSOC,GSE213243_HGSOC_ascites


In [61]:
#exclude any samples with <100 cells
table(HGSOC$integration_id)
#none to exclude


GSE213243_HGSOC_ascites  GSE213243_HGSOC_tumour 
                   2688                     804 

In [62]:
#join layers and then split them by integration_id
Layers(HGSOC[["RNA"]])
#join layers
HGSOC[["RNA"]] <- JoinLayers(HGSOC[["RNA"]])
Layers(HGSOC[["RNA"]])
#split layers
HGSOC[["RNA"]] <- split(HGSOC[["RNA"]], f = HGSOC$integration_id)
Layers(HGSOC[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [63]:
#record number of cells
table(HGSOC$integration_id)


GSE213243_HGSOC_ascites  GSE213243_HGSOC_tumour 
                   2688                     804 

In [64]:
#re-export seurat object ready for integration
saveRDS(HGSOC, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE213243_myeloid_int.RDS")

In [65]:
#remove all objects in R
rm(list = ls())

## GSE217517

In [66]:
HGSOC <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE217517_myeloid.RDS")

In [67]:
HGSOC
HGSOC@project.name
head(HGSOC@meta.data)

An object of class Seurat 
36601 features across 8457 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 17 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.5,seurat_clusters,RNA_snn_res.0.2
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<fct>
GSE217517_pt1_AAACGAAAGAACCCGA-1,GSE217517,7268,2217,tumour,HGSOC,pt1,GSE217517_HGSOC_pt1,3.76995,9,1,1
GSE217517_pt1_AAAGAACCAGGGCTTC-1,GSE217517,20132,4339,tumour,HGSOC,pt1,GSE217517_HGSOC_pt1,7.634612,5,1,1
GSE217517_pt1_AAAGAACTCCATGAGT-1,GSE217517,4183,1410,tumour,HGSOC,pt1,GSE217517_HGSOC_pt1,35.142242,9,1,1
GSE217517_pt1_AAAGGATTCTATTTCG-1,GSE217517,3037,1274,tumour,HGSOC,pt1,GSE217517_HGSOC_pt1,6.914718,9,1,1
GSE217517_pt1_AAATGGACACTGAGGA-1,GSE217517,9516,2822,tumour,HGSOC,pt1,GSE217517_HGSOC_pt1,2.847835,5,1,1
GSE217517_pt1_AACAGGGGTCATCGGC-1,GSE217517,22104,4611,tumour,HGSOC,pt1,GSE217517_HGSOC_pt1,7.69544,9,1,1


In [68]:
table(HGSOC$sample_type)
table(HGSOC$cancer_type)
table(HGSOC$patient_id)
table(HGSOC$sample_id)


tumour 
  8457 


HGSOC 
 8457 


 pt1  pt2  pt3  pt4  pt5  pt6  pt7  pt8 
 842  966 2678 1517 1004   37 1054  359 


GSE217517_HGSOC_pt1 GSE217517_HGSOC_pt2 GSE217517_HGSOC_pt3 GSE217517_HGSOC_pt4 
                842                 966                2678                1517 
GSE217517_HGSOC_pt5 GSE217517_HGSOC_pt6 GSE217517_HGSOC_pt7 GSE217517_HGSOC_pt8 
               1004                  37                1054                 359 

In [70]:
#set site metadata
HGSOC@meta.data$site <- "ovary"
HGSOC@meta.data$sample_type_major <- "primary tumour"

In [71]:
#set subtype metadata
HGSOC@meta.data$cancer_subtype <- "HGSOC"

In [72]:
#set integration_id metadata
HGSOC@meta.data$integration_id <- HGSOC@meta.data$sample_id

In [73]:
HGSOC
HGSOC@project.name
head(HGSOC@meta.data)

An object of class Seurat 
36601 features across 8457 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 17 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.5,seurat_clusters,RNA_snn_res.0.2,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<fct>,<chr>,<chr>,<chr>,<chr>
GSE217517_pt1_AAACGAAAGAACCCGA-1,GSE217517,7268,2217,tumour,HGSOC,pt1,GSE217517_HGSOC_pt1,3.76995,9,1,1,ovary,primary tumour,HGSOC,GSE217517_HGSOC_pt1
GSE217517_pt1_AAAGAACCAGGGCTTC-1,GSE217517,20132,4339,tumour,HGSOC,pt1,GSE217517_HGSOC_pt1,7.634612,5,1,1,ovary,primary tumour,HGSOC,GSE217517_HGSOC_pt1
GSE217517_pt1_AAAGAACTCCATGAGT-1,GSE217517,4183,1410,tumour,HGSOC,pt1,GSE217517_HGSOC_pt1,35.142242,9,1,1,ovary,primary tumour,HGSOC,GSE217517_HGSOC_pt1
GSE217517_pt1_AAAGGATTCTATTTCG-1,GSE217517,3037,1274,tumour,HGSOC,pt1,GSE217517_HGSOC_pt1,6.914718,9,1,1,ovary,primary tumour,HGSOC,GSE217517_HGSOC_pt1
GSE217517_pt1_AAATGGACACTGAGGA-1,GSE217517,9516,2822,tumour,HGSOC,pt1,GSE217517_HGSOC_pt1,2.847835,5,1,1,ovary,primary tumour,HGSOC,GSE217517_HGSOC_pt1
GSE217517_pt1_AACAGGGGTCATCGGC-1,GSE217517,22104,4611,tumour,HGSOC,pt1,GSE217517_HGSOC_pt1,7.69544,9,1,1,ovary,primary tumour,HGSOC,GSE217517_HGSOC_pt1


In [74]:
#exclude any samples with <100 cells
table(HGSOC$integration_id)
#exclude patient 6
HGSOC <- subset(HGSOC, !(subset = integration_id %in% c("GSE217517_HGSOC_pt6")))
table(HGSOC$integration_id)


GSE217517_HGSOC_pt1 GSE217517_HGSOC_pt2 GSE217517_HGSOC_pt3 GSE217517_HGSOC_pt4 
                842                 966                2678                1517 
GSE217517_HGSOC_pt5 GSE217517_HGSOC_pt6 GSE217517_HGSOC_pt7 GSE217517_HGSOC_pt8 
               1004                  37                1054                 359 


GSE217517_HGSOC_pt1 GSE217517_HGSOC_pt2 GSE217517_HGSOC_pt3 GSE217517_HGSOC_pt4 
                842                 966                2678                1517 
GSE217517_HGSOC_pt5 GSE217517_HGSOC_pt7 GSE217517_HGSOC_pt8 
               1004                1054                 359 

In [75]:
#join layers and then split them by integration_id
Layers(HGSOC[["RNA"]])
#join layers
HGSOC[["RNA"]] <- JoinLayers(HGSOC[["RNA"]])
Layers(HGSOC[["RNA"]])
#split layers
HGSOC[["RNA"]] <- split(HGSOC[["RNA"]], f = HGSOC$integration_id)
Layers(HGSOC[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [76]:
#record number of cells
table(HGSOC$integration_id)


GSE217517_HGSOC_pt1 GSE217517_HGSOC_pt2 GSE217517_HGSOC_pt3 GSE217517_HGSOC_pt4 
                842                 966                2678                1517 
GSE217517_HGSOC_pt5 GSE217517_HGSOC_pt7 GSE217517_HGSOC_pt8 
               1004                1054                 359 

In [77]:
#re-export seurat object ready for integration
saveRDS(HGSOC, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE217517_myeloid_int.RDS")

In [78]:
#remove all objects in R
rm(list = ls())

## PRJCA005422

In [79]:
HGSOC_As <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/PRJCA005422_ascites_myeloid.RDS")
HGSOC_Tu <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/PRJCA005422_tumour_myeloid.RDS")

In [80]:
HGSOC_As
HGSOC_As@project.name
head(HGSOC_As@meta.data)

HGSOC_Tu
HGSOC_Tu@project.name
head(HGSOC_Tu@meta.data)

An object of class Seurat 
27127 features across 16120 samples within 1 assay 
Active assay: RNA (27127 features, 2000 variable features)
 3 layers present: counts, data, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,Cellname,Samples,Groups,Patients,percent.mt,percent.ribo,percent.HSP,⋯,maintypes_2,maintypes_3,UMAP_1,UMAP_2,sample_type,cancer_type,patient_id,sample_id,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<fct>,<dbl>,<int>,<chr>,<chr>,<fct>,<fct>,<dbl>,<dbl>,<dbl>,⋯,<fct>,<chr>,<dbl>,<dbl>,<fct>,<chr>,<fct>,<chr>,<fct>,<fct>
PRJCA005422_EOC1_FS_cell_AACACGTGTCGGCACT,EOC1,24353,1770,EOC1_FS_cell_AACACGTGTCGGCACT,HGSOC1_AS,Ascites,HGSOC1,0.8007227,3.859894,0.151932,⋯,B,Lymphoid cells,0.1662883,13.255426,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8
PRJCA005422_EOC1_FS_cell_AACCGCGTCCCTAACC,EOC1,531,365,EOC1_FS_cell_AACCGCGTCCCTAACC,HGSOC1_AS,Ascites,HGSOC1,5.6497175,18.796992,0.1879699,⋯,B,Lymphoid cells,-6.0097427,11.557367,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,4,4
PRJCA005422_EOC1_FS_cell_AACTCCCAGTTTCCTT,EOC1,9273,3128,EOC1_FS_cell_AACTCCCAGTTTCCTT,HGSOC1_AS,Ascites,HGSOC1,3.5479349,10.673854,0.4743935,⋯,Proliferative cells,Proliferative cells,2.3708313,2.94219,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8
PRJCA005422_EOC1_FS_cell_AAGGCAGGTTAAAGTG,EOC1,4757,2120,EOC1_FS_cell_AAGGCAGGTTAAAGTG,HGSOC1_AS,Ascites,HGSOC1,9.7750683,15.762926,0.6094998,⋯,Proliferative cells,Proliferative cells,2.5225659,2.890291,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8
PRJCA005422_EOC1_FS_cell_AAGGCAGTCAACACTG,EOC1,19574,3727,EOC1_FS_cell_AAGGCAGTCAACACTG,HGSOC1_AS,Ascites,HGSOC1,7.1319097,24.63857,0.3473819,⋯,Proliferative cells,Proliferative cells,1.6956519,3.476046,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8
PRJCA005422_EOC1_FS_cell_ACACCAAAGCTAACTC,EOC1,22514,3971,EOC1_FS_cell_ACACCAAAGCTAACTC,HGSOC1_AS,Ascites,HGSOC1,5.7208848,20.034642,0.7683425,⋯,Proliferative cells,Proliferative cells,2.063814,12.25453,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8


An object of class Seurat 
27127 features across 13256 samples within 1 assay 
Active assay: RNA (27127 features, 2000 variable features)
 3 layers present: counts, data, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,Cellname,Samples,Groups,Patients,percent.mt,percent.ribo,percent.HSP,⋯,maintypes_2,maintypes_3,UMAP_1,UMAP_2,sample_type,cancer_type,patient_id,sample_id,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<fct>,<dbl>,<int>,<chr>,<chr>,<fct>,<fct>,<dbl>,<dbl>,<dbl>,⋯,<fct>,<chr>,<dbl>,<dbl>,<fct>,<chr>,<fct>,<chr>,<fct>,<fct>
PRJCA005422_EOC1_OC_cell_CCACGGACACCAGGCT,EOC1,631,422,EOC1_OC_cell_CCACGGACACCAGGCT,HGSOC1_PT,Primary Tumor,HGSOC1,5.0713154,16.4557,1.2658228,⋯,Proliferative cells,Proliferative cells,1.570391,2.864907,Primary Tumor,HGSOC,HGSOC1,PRJCA005422_HGSOC1_PT,0,0
PRJCA005422_EOC1_OC_cell_CCTACACAGAGTCTGG,EOC1,639,368,EOC1_OC_cell_CCTACACAGAGTCTGG,HGSOC1_PT,Primary Tumor,HGSOC1,3.7558685,25.0,1.40625,⋯,Proliferative cells,Proliferative cells,2.063956,3.946421,Primary Tumor,HGSOC,HGSOC1,PRJCA005422_HGSOC1_PT,0,0
PRJCA005422_EOC1_TM_cell_AGTTGGTTCACGCATA,EOC1,651,394,EOC1_TM_cell_AGTTGGTTCACGCATA,HGSOC1_MT,Metastatic Tumor,HGSOC1,0.4608295,26.72811,0.1536098,⋯,Proliferative cells,Proliferative cells,1.906724,3.60265,Metastatic Tumor,HGSOC,HGSOC1,PRJCA005422_HGSOC1_MT,0,0
PRJCA005422_EOC1_TM_cell_CGATCGGCACGCTTTC,EOC1,1480,787,EOC1_TM_cell_CGATCGGCACGCTTTC,HGSOC1_MT,Metastatic Tumor,HGSOC1,0.4054054,25.60811,0.6081081,⋯,Proliferative cells,Proliferative cells,2.0147,3.495903,Metastatic Tumor,HGSOC,HGSOC1,PRJCA005422_HGSOC1_MT,0,0
PRJCA005422_EOC1_TM_cell_CTCTAATTCTTTACGT,EOC1,1067,522,EOC1_TM_cell_CTCTAATTCTTTACGT,HGSOC1_MT,Metastatic Tumor,HGSOC1,1.2183693,38.23805,0.1874414,⋯,Proliferative cells,Proliferative cells,1.414386,3.496294,Metastatic Tumor,HGSOC,HGSOC1,PRJCA005422_HGSOC1_MT,0,0
PRJCA005422_EOC1_TM_cell_GCCTCTACACGGTTTA,EOC1,1629,792,EOC1_TM_cell_GCCTCTACACGGTTTA,HGSOC1_MT,Metastatic Tumor,HGSOC1,0.0,26.27379,0.9821977,⋯,Proliferative cells,Proliferative cells,1.58388,3.464499,Metastatic Tumor,HGSOC,HGSOC1,PRJCA005422_HGSOC1_MT,0,0


In [81]:
table(HGSOC_As$sample_type)
table(HGSOC_As$cancer_type)
table(HGSOC_As$patient_id)
table(HGSOC_As$sample_id)

table(HGSOC_Tu$sample_type)
table(HGSOC_Tu$cancer_type)
table(HGSOC_Tu$patient_id)
table(HGSOC_Tu$sample_id)


   Primary Tumor Metastatic Tumor       Lymph Node          Ascites 
               0                0                0            16120 
            PBMC 
               0 


HGSOC 
16120 


 HGSOC1  HGSOC2  HGSOC3  HGSOC4  HGSOC5  HGSOC6  HGSOC7  HGSOC8  HGSOC9 HGSOC10 
   1149    6695     662       0    1743     829       0    1110    3589     343 
   ECO1    UOC1   OCCC1      C1 
      0       0       0       0 


 PRJCA005422_HGSOC1_AS PRJCA005422_HGSOC10_AS  PRJCA005422_HGSOC2_AS 
                  1149                    343                   6695 
 PRJCA005422_HGSOC3_AS  PRJCA005422_HGSOC5_AS  PRJCA005422_HGSOC6_AS 
                   662                   1743                    829 
 PRJCA005422_HGSOC8_AS  PRJCA005422_HGSOC9_AS 
                  1110                   3589 


   Primary Tumor Metastatic Tumor       Lymph Node          Ascites 
            8041             5215                0                0 
            PBMC 
               0 


HGSOC 
13256 


 HGSOC1  HGSOC2  HGSOC3  HGSOC4  HGSOC5  HGSOC6  HGSOC7  HGSOC8  HGSOC9 HGSOC10 
   2639     633    3523    1104      70    2150    1179     121    1087     750 
   ECO1    UOC1   OCCC1      C1 
      0       0       0       0 


 PRJCA005422_HGSOC1_MT  PRJCA005422_HGSOC1_PT PRJCA005422_HGSOC10_PT 
                  1231                   1408                    750 
 PRJCA005422_HGSOC2_PT  PRJCA005422_HGSOC3_MT  PRJCA005422_HGSOC3_PT 
                   633                   1711                   1812 
 PRJCA005422_HGSOC4_MT  PRJCA005422_HGSOC4_PT  PRJCA005422_HGSOC5_PT 
                   816                    288                     70 
 PRJCA005422_HGSOC6_MT  PRJCA005422_HGSOC6_PT  PRJCA005422_HGSOC7_PT 
                  1457                    693                   1179 
 PRJCA005422_HGSOC8_PT  PRJCA005422_HGSOC9_PT 
                   121                   1087 

In [82]:
#set site metadata
HGSOC_As@meta.data$site <- "ascites fluid"
HGSOC_As@meta.data$sample_type_major <- "ascites"

#HGSOC_Tu need to split up primary and mets by location
HGSOC_Pr <- subset(HGSOC_Tu, subset = sample_type %in% c("Primary Tumor"))
HGSOC_Me <- subset(HGSOC_Tu, subset = sample_type %in% c("Metastatic Tumor"))

HGSOC_Pr@meta.data$site <- "ovary"
HGSOC_Me@meta.data$site <- "omentum"

HGSOC_Pr@meta.data$sample_type_major <- "primary tumour"
HGSOC_Me@meta.data$sample_type_major <- "metastatic tumour"

#Merge seurat objects back together
HGSOC <- merge(HGSOC_As, y = c(HGSOC_Pr, HGSOC_Me), project = "PRJCA005422")

In [84]:
#set subtype metadata
HGSOC@meta.data$cancer_subtype <- "HGSOC"

In [85]:
#set integration_id metadata
HGSOC@meta.data$integration_id <- HGSOC@meta.data$sample_id

In [86]:
HGSOC
HGSOC@project.name
head(HGSOC@meta.data)

An object of class Seurat 
27127 features across 29376 samples within 1 assay 
Active assay: RNA (27127 features, 2000 variable features)
 9 layers present: counts.1, counts.2, counts.3, data.1, scale.data.1, data.2, scale.data.2, data.3, scale.data.3

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,Cellname,Samples,Groups,Patients,percent.mt,percent.ribo,percent.HSP,⋯,sample_type,cancer_type,patient_id,sample_id,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,⋯,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
PRJCA005422_EOC1_FS_cell_AACACGTGTCGGCACT,EOC1,24353,1770,EOC1_FS_cell_AACACGTGTCGGCACT,HGSOC1_AS,Ascites,HGSOC1,0.8007227,3.859894,0.151932,⋯,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8,ascites fluid,ascites,HGSOC,PRJCA005422_HGSOC1_AS
PRJCA005422_EOC1_FS_cell_AACCGCGTCCCTAACC,EOC1,531,365,EOC1_FS_cell_AACCGCGTCCCTAACC,HGSOC1_AS,Ascites,HGSOC1,5.6497175,18.796992,0.1879699,⋯,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,4,4,ascites fluid,ascites,HGSOC,PRJCA005422_HGSOC1_AS
PRJCA005422_EOC1_FS_cell_AACTCCCAGTTTCCTT,EOC1,9273,3128,EOC1_FS_cell_AACTCCCAGTTTCCTT,HGSOC1_AS,Ascites,HGSOC1,3.5479349,10.673854,0.4743935,⋯,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8,ascites fluid,ascites,HGSOC,PRJCA005422_HGSOC1_AS
PRJCA005422_EOC1_FS_cell_AAGGCAGGTTAAAGTG,EOC1,4757,2120,EOC1_FS_cell_AAGGCAGGTTAAAGTG,HGSOC1_AS,Ascites,HGSOC1,9.7750683,15.762926,0.6094998,⋯,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8,ascites fluid,ascites,HGSOC,PRJCA005422_HGSOC1_AS
PRJCA005422_EOC1_FS_cell_AAGGCAGTCAACACTG,EOC1,19574,3727,EOC1_FS_cell_AAGGCAGTCAACACTG,HGSOC1_AS,Ascites,HGSOC1,7.1319097,24.63857,0.3473819,⋯,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8,ascites fluid,ascites,HGSOC,PRJCA005422_HGSOC1_AS
PRJCA005422_EOC1_FS_cell_ACACCAAAGCTAACTC,EOC1,22514,3971,EOC1_FS_cell_ACACCAAAGCTAACTC,HGSOC1_AS,Ascites,HGSOC1,5.7208848,20.034642,0.7683425,⋯,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8,ascites fluid,ascites,HGSOC,PRJCA005422_HGSOC1_AS


In [87]:
#exclude any samples with <100 cells
table(HGSOC$integration_id)
#exclude patient HGSOC5 primary tumour
HGSOC <- subset(HGSOC, !(subset = integration_id %in% c("PRJCA005422_HGSOC5_PT")))
table(HGSOC$integration_id)


 PRJCA005422_HGSOC1_AS  PRJCA005422_HGSOC1_MT  PRJCA005422_HGSOC1_PT 
                  1149                   1231                   1408 
PRJCA005422_HGSOC10_AS PRJCA005422_HGSOC10_PT  PRJCA005422_HGSOC2_AS 
                   343                    750                   6695 
 PRJCA005422_HGSOC2_PT  PRJCA005422_HGSOC3_AS  PRJCA005422_HGSOC3_MT 
                   633                    662                   1711 
 PRJCA005422_HGSOC3_PT  PRJCA005422_HGSOC4_MT  PRJCA005422_HGSOC4_PT 
                  1812                    816                    288 
 PRJCA005422_HGSOC5_AS  PRJCA005422_HGSOC5_PT  PRJCA005422_HGSOC6_AS 
                  1743                     70                    829 
 PRJCA005422_HGSOC6_MT  PRJCA005422_HGSOC6_PT  PRJCA005422_HGSOC7_PT 
                  1457                    693                   1179 
 PRJCA005422_HGSOC8_AS  PRJCA005422_HGSOC8_PT  PRJCA005422_HGSOC9_AS 
                  1110                    121                   3589 
 PRJCA005422_HGSOC9


 PRJCA005422_HGSOC1_AS  PRJCA005422_HGSOC1_MT  PRJCA005422_HGSOC1_PT 
                  1149                   1231                   1408 
PRJCA005422_HGSOC10_AS PRJCA005422_HGSOC10_PT  PRJCA005422_HGSOC2_AS 
                   343                    750                   6695 
 PRJCA005422_HGSOC2_PT  PRJCA005422_HGSOC3_AS  PRJCA005422_HGSOC3_MT 
                   633                    662                   1711 
 PRJCA005422_HGSOC3_PT  PRJCA005422_HGSOC4_MT  PRJCA005422_HGSOC4_PT 
                  1812                    816                    288 
 PRJCA005422_HGSOC5_AS  PRJCA005422_HGSOC6_AS  PRJCA005422_HGSOC6_MT 
                  1743                    829                   1457 
 PRJCA005422_HGSOC6_PT  PRJCA005422_HGSOC7_PT  PRJCA005422_HGSOC8_AS 
                   693                   1179                   1110 
 PRJCA005422_HGSOC8_PT  PRJCA005422_HGSOC9_AS  PRJCA005422_HGSOC9_PT 
                   121                   3589                   1087 

In [88]:
#join layers and then split them by integration_id
Layers(HGSOC[["RNA"]])
#join layers
HGSOC[["RNA"]] <- JoinLayers(HGSOC[["RNA"]])
Layers(HGSOC[["RNA"]])
#split layers
HGSOC[["RNA"]] <- split(HGSOC[["RNA"]], f = HGSOC$integration_id)
Layers(HGSOC[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [89]:
#record number of cells
HGSOC
HGSOC@project.name
head(HGSOC@meta.data)
tail(HGSOC@meta.data)
table(HGSOC$integration_id)

An object of class Seurat 
27127 features across 29306 samples within 1 assay 
Active assay: RNA (27127 features, 2000 variable features)
 43 layers present: counts.PRJCA005422_HGSOC1_AS, counts.PRJCA005422_HGSOC3_AS, counts.PRJCA005422_HGSOC2_AS, counts.PRJCA005422_HGSOC6_AS, counts.PRJCA005422_HGSOC5_AS, counts.PRJCA005422_HGSOC8_AS, counts.PRJCA005422_HGSOC9_AS, counts.PRJCA005422_HGSOC10_AS, counts.PRJCA005422_HGSOC1_PT, counts.PRJCA005422_HGSOC3_PT, counts.PRJCA005422_HGSOC2_PT, counts.PRJCA005422_HGSOC7_PT, counts.PRJCA005422_HGSOC6_PT, counts.PRJCA005422_HGSOC4_PT, counts.PRJCA005422_HGSOC8_PT, counts.PRJCA005422_HGSOC9_PT, counts.PRJCA005422_HGSOC10_PT, counts.PRJCA005422_HGSOC1_MT, counts.PRJCA005422_HGSOC3_MT, counts.PRJCA005422_HGSOC6_MT, counts.PRJCA005422_HGSOC4_MT, scale.data, data.PRJCA005422_HGSOC1_AS, data.PRJCA005422_HGSOC3_AS, data.PRJCA005422_HGSOC2_AS, data.PRJCA005422_HGSOC6_AS, data.PRJCA005422_HGSOC5_AS, data.PRJCA005422_HGSOC8_AS, data.PRJCA005422_HGSOC9_AS, da

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,Cellname,Samples,Groups,Patients,percent.mt,percent.ribo,percent.HSP,⋯,sample_type,cancer_type,patient_id,sample_id,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,⋯,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
PRJCA005422_EOC1_FS_cell_AACACGTGTCGGCACT,EOC1,24353,1770,EOC1_FS_cell_AACACGTGTCGGCACT,HGSOC1_AS,Ascites,HGSOC1,0.8007227,3.859894,0.151932,⋯,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8,ascites fluid,ascites,HGSOC,PRJCA005422_HGSOC1_AS
PRJCA005422_EOC1_FS_cell_AACCGCGTCCCTAACC,EOC1,531,365,EOC1_FS_cell_AACCGCGTCCCTAACC,HGSOC1_AS,Ascites,HGSOC1,5.6497175,18.796992,0.1879699,⋯,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,4,4,ascites fluid,ascites,HGSOC,PRJCA005422_HGSOC1_AS
PRJCA005422_EOC1_FS_cell_AACTCCCAGTTTCCTT,EOC1,9273,3128,EOC1_FS_cell_AACTCCCAGTTTCCTT,HGSOC1_AS,Ascites,HGSOC1,3.5479349,10.673854,0.4743935,⋯,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8,ascites fluid,ascites,HGSOC,PRJCA005422_HGSOC1_AS
PRJCA005422_EOC1_FS_cell_AAGGCAGGTTAAAGTG,EOC1,4757,2120,EOC1_FS_cell_AAGGCAGGTTAAAGTG,HGSOC1_AS,Ascites,HGSOC1,9.7750683,15.762926,0.6094998,⋯,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8,ascites fluid,ascites,HGSOC,PRJCA005422_HGSOC1_AS
PRJCA005422_EOC1_FS_cell_AAGGCAGTCAACACTG,EOC1,19574,3727,EOC1_FS_cell_AAGGCAGTCAACACTG,HGSOC1_AS,Ascites,HGSOC1,7.1319097,24.63857,0.3473819,⋯,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8,ascites fluid,ascites,HGSOC,PRJCA005422_HGSOC1_AS
PRJCA005422_EOC1_FS_cell_ACACCAAAGCTAACTC,EOC1,22514,3971,EOC1_FS_cell_ACACCAAAGCTAACTC,HGSOC1_AS,Ascites,HGSOC1,5.7208848,20.034642,0.7683425,⋯,Ascites,HGSOC,HGSOC1,PRJCA005422_HGSOC1_AS,8,8,ascites fluid,ascites,HGSOC,PRJCA005422_HGSOC1_AS


Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,Cellname,Samples,Groups,Patients,percent.mt,percent.ribo,percent.HSP,⋯,sample_type,cancer_type,patient_id,sample_id,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,⋯,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
PRJCA005422_EOC4_TM_cell_TTTCCTCGTTCAACCA,EOC4,12786,3090,EOC4_TM_cell_TTTCCTCGTTCAACCA,HGSOC4_MT,Metastatic Tumor,HGSOC4,0.9619897,7.382498,0.4223039,⋯,Metastatic Tumor,HGSOC,HGSOC4,PRJCA005422_HGSOC4_MT,0,0,omentum,metastatic tumour,HGSOC,PRJCA005422_HGSOC4_MT
PRJCA005422_EOC4_TM_cell_TTTCCTCTCTGGTATG,EOC4,13288,3440,EOC4_TM_cell_TTTCCTCTCTGGTATG,HGSOC4_MT,Metastatic Tumor,HGSOC4,3.6950632,15.937994,0.3988261,⋯,Metastatic Tumor,HGSOC,HGSOC4,PRJCA005422_HGSOC4_MT,0,0,omentum,metastatic tumour,HGSOC,PRJCA005422_HGSOC4_MT
PRJCA005422_EOC4_TM_cell_TTTCCTCTCTGGTTCC,EOC4,16083,3128,EOC4_TM_cell_TTTCCTCTCTGGTTCC,HGSOC4_MT,Metastatic Tumor,HGSOC4,2.8352919,7.181049,0.32952,⋯,Metastatic Tumor,HGSOC,HGSOC4,PRJCA005422_HGSOC4_MT,0,0,omentum,metastatic tumour,HGSOC,PRJCA005422_HGSOC4_MT
PRJCA005422_EOC4_TM_cell_TTTCCTCTCTGTCTCG,EOC4,23953,4481,EOC4_TM_cell_TTTCCTCTCTGTCTCG,HGSOC4_MT,Metastatic Tumor,HGSOC4,2.91404,8.449176,0.4049259,⋯,Metastatic Tumor,HGSOC,HGSOC4,PRJCA005422_HGSOC4_MT,0,0,omentum,metastatic tumour,HGSOC,PRJCA005422_HGSOC4_MT
PRJCA005422_EOC4_TM_cell_TTTGCGCAGGAGTCTG,EOC4,17788,3363,EOC4_TM_cell_TTTGCGCAGGAGTCTG,HGSOC4_MT,Metastatic Tumor,HGSOC4,4.1207556,15.797167,0.387902,⋯,Metastatic Tumor,HGSOC,HGSOC4,PRJCA005422_HGSOC4_MT,0,0,omentum,metastatic tumour,HGSOC,PRJCA005422_HGSOC4_MT
PRJCA005422_EOC4_TM_cell_TTTGTCATCCCAACGG,EOC4,16918,3537,EOC4_TM_cell_TTTGTCATCCCAACGG,HGSOC4_MT,Metastatic Tumor,HGSOC4,7.4063128,20.745907,0.4787517,⋯,Metastatic Tumor,HGSOC,HGSOC4,PRJCA005422_HGSOC4_MT,0,0,omentum,metastatic tumour,HGSOC,PRJCA005422_HGSOC4_MT



 PRJCA005422_HGSOC1_AS  PRJCA005422_HGSOC1_MT  PRJCA005422_HGSOC1_PT 
                  1149                   1231                   1408 
PRJCA005422_HGSOC10_AS PRJCA005422_HGSOC10_PT  PRJCA005422_HGSOC2_AS 
                   343                    750                   6695 
 PRJCA005422_HGSOC2_PT  PRJCA005422_HGSOC3_AS  PRJCA005422_HGSOC3_MT 
                   633                    662                   1711 
 PRJCA005422_HGSOC3_PT  PRJCA005422_HGSOC4_MT  PRJCA005422_HGSOC4_PT 
                  1812                    816                    288 
 PRJCA005422_HGSOC5_AS  PRJCA005422_HGSOC6_AS  PRJCA005422_HGSOC6_MT 
                  1743                    829                   1457 
 PRJCA005422_HGSOC6_PT  PRJCA005422_HGSOC7_PT  PRJCA005422_HGSOC8_AS 
                   693                   1179                   1110 
 PRJCA005422_HGSOC8_PT  PRJCA005422_HGSOC9_AS  PRJCA005422_HGSOC9_PT 
                   121                   3589                   1087 

In [90]:
#re-export seurat object ready for integration
saveRDS(HGSOC, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/PRJCA005422_myeloid_int.RDS")

In [91]:
#remove all objects in R
rm(list = ls())

## GSE200218

In [123]:
MEL <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE200218_myeloid.RDS")

In [124]:
MEL
MEL@project.name
head(MEL@meta.data)

An object of class Seurat 
36601 features across 10371 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 11 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, data.1, data.2, data.3, data.4, data.5, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE200218_MBM01_AAACCTGAGCTGCAAG-1,GSE200218,15791,4063,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,4.439238,0,0
GSE200218_MBM01_AAACCTGCAATCGGTT-1,GSE200218,29993,5932,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,3.270763,2,2
GSE200218_MBM01_AAACCTGGTACTTGAC-1,GSE200218,21267,5177,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,3.28208,7,7
GSE200218_MBM01_AAACCTGGTTAAGATG-1,GSE200218,25744,5563,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,3.899938,0,0
GSE200218_MBM01_AAACCTGTCACGGTTA-1,GSE200218,14369,3779,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,3.257012,0,0
GSE200218_MBM01_AAACCTGTCCGCATCT-1,GSE200218,3921,2039,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,4.41214,0,0


In [125]:
table(MEL$sample_type)
table(MEL$cancer_type)
table(MEL$patient_id)
table(MEL$sample_id)


metastasis 
     10371 


melanoma brain mets 
              10371 


MBM01 MBM02 MBM03 MBM04 MBM05 
 1411  2035  1945  3143  1837 


GSE200218_MBM01 GSE200218_MBM02 GSE200218_MBM03 GSE200218_MBM04 GSE200218_MBM05 
           1411            2035            1945            3143            1837 

In [126]:
#set site and sample_type_major metadata
MEL@meta.data$site <- "brain"
MEL@meta.data$sample_type_major <- "metastatic tumour"

In [127]:
#set subtype metadata
MEL@meta.data$cancer_subtype <- "Melanoma"

In [128]:
#set integration_id metadata
MEL@meta.data$integration_id <- MEL@meta.data$sample_id

In [129]:
MEL
MEL@project.name
head(MEL@meta.data)

An object of class Seurat 
36601 features across 10371 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 11 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, data.1, data.2, data.3, data.4, data.5, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<chr>,<chr>,<chr>,<chr>
GSE200218_MBM01_AAACCTGAGCTGCAAG-1,GSE200218,15791,4063,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,4.439238,0,0,brain,metastatic tumour,Melanoma,GSE200218_MBM01
GSE200218_MBM01_AAACCTGCAATCGGTT-1,GSE200218,29993,5932,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,3.270763,2,2,brain,metastatic tumour,Melanoma,GSE200218_MBM01
GSE200218_MBM01_AAACCTGGTACTTGAC-1,GSE200218,21267,5177,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,3.28208,7,7,brain,metastatic tumour,Melanoma,GSE200218_MBM01
GSE200218_MBM01_AAACCTGGTTAAGATG-1,GSE200218,25744,5563,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,3.899938,0,0,brain,metastatic tumour,Melanoma,GSE200218_MBM01
GSE200218_MBM01_AAACCTGTCACGGTTA-1,GSE200218,14369,3779,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,3.257012,0,0,brain,metastatic tumour,Melanoma,GSE200218_MBM01
GSE200218_MBM01_AAACCTGTCCGCATCT-1,GSE200218,3921,2039,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,4.41214,0,0,brain,metastatic tumour,Melanoma,GSE200218_MBM01


In [130]:
#exclude any samples with <100 cells
table(MEL$integration_id)
#none to exclude


GSE200218_MBM01 GSE200218_MBM02 GSE200218_MBM03 GSE200218_MBM04 GSE200218_MBM05 
           1411            2035            1945            3143            1837 

In [131]:
#join layers and then split them by integration_id
Layers(MEL[["RNA"]])
#join layers
MEL[["RNA"]] <- JoinLayers(MEL[["RNA"]])
Layers(MEL[["RNA"]])
#split layers
MEL[["RNA"]] <- split(MEL[["RNA"]], f = MEL$integration_id)
Layers(MEL[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [132]:
#record number of cells
MEL
MEL@project.name
head(MEL@meta.data)
tail(MEL@meta.data)
table(MEL$integration_id)

An object of class Seurat 
36601 features across 10371 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 11 layers present: data.GSE200218_MBM01, data.GSE200218_MBM02, data.GSE200218_MBM03, data.GSE200218_MBM04, data.GSE200218_MBM05, scale.data, counts.GSE200218_MBM01, counts.GSE200218_MBM02, counts.GSE200218_MBM03, counts.GSE200218_MBM04, counts.GSE200218_MBM05
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<chr>,<chr>,<chr>,<chr>
GSE200218_MBM01_AAACCTGAGCTGCAAG-1,GSE200218,15791,4063,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,4.439238,0,0,brain,metastatic tumour,Melanoma,GSE200218_MBM01
GSE200218_MBM01_AAACCTGCAATCGGTT-1,GSE200218,29993,5932,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,3.270763,2,2,brain,metastatic tumour,Melanoma,GSE200218_MBM01
GSE200218_MBM01_AAACCTGGTACTTGAC-1,GSE200218,21267,5177,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,3.28208,7,7,brain,metastatic tumour,Melanoma,GSE200218_MBM01
GSE200218_MBM01_AAACCTGGTTAAGATG-1,GSE200218,25744,5563,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,3.899938,0,0,brain,metastatic tumour,Melanoma,GSE200218_MBM01
GSE200218_MBM01_AAACCTGTCACGGTTA-1,GSE200218,14369,3779,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,3.257012,0,0,brain,metastatic tumour,Melanoma,GSE200218_MBM01
GSE200218_MBM01_AAACCTGTCCGCATCT-1,GSE200218,3921,2039,metastasis,melanoma brain mets,MBM01,GSE200218_MBM01,4.41214,0,0,brain,metastatic tumour,Melanoma,GSE200218_MBM01


Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<chr>,<chr>,<chr>,<chr>
GSE200218_MBM05_TTTGCGCTCATGCTCC-1,GSE200218,9728,2563,metastasis,melanoma brain mets,MBM05,GSE200218_MBM05,9.477796,0,0,brain,metastatic tumour,Melanoma,GSE200218_MBM05
GSE200218_MBM05_TTTGCGCTCCTCGCAT-1,GSE200218,13511,3309,metastasis,melanoma brain mets,MBM05,GSE200218_MBM05,3.449042,2,2,brain,metastatic tumour,Melanoma,GSE200218_MBM05
GSE200218_MBM05_TTTGGTTTCTCTAAGG-1,GSE200218,15440,3540,metastasis,melanoma brain mets,MBM05,GSE200218_MBM05,4.410622,2,2,brain,metastatic tumour,Melanoma,GSE200218_MBM05
GSE200218_MBM05_TTTGTCAAGATGGCGT-1,GSE200218,10913,2889,metastasis,melanoma brain mets,MBM05,GSE200218_MBM05,4.370934,2,2,brain,metastatic tumour,Melanoma,GSE200218_MBM05
GSE200218_MBM05_TTTGTCAAGTTAGGTA-1,GSE200218,5539,2048,metastasis,melanoma brain mets,MBM05,GSE200218_MBM05,5.361979,0,0,brain,metastatic tumour,Melanoma,GSE200218_MBM05
GSE200218_MBM05_TTTGTCATCTTGGGTA-1,GSE200218,1780,1076,metastasis,melanoma brain mets,MBM05,GSE200218_MBM05,10.561798,7,7,brain,metastatic tumour,Melanoma,GSE200218_MBM05



GSE200218_MBM01 GSE200218_MBM02 GSE200218_MBM03 GSE200218_MBM04 GSE200218_MBM05 
           1411            2035            1945            3143            1837 

In [134]:
#re-export seurat object ready for integration
saveRDS(MEL, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE200218_myeloid_int.RDS")

In [135]:
#remove all objects in R
rm(list = ls())

## GSE215120

In [136]:
MEL_Ac <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE215120_AcMEL_myeloid.RDS")
MEL_Cu <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE215120_CuMEL_myeloid.RDS")

In [137]:
MEL_Ac
MEL_Ac@project.name
head(MEL_Ac@meta.data)

MEL_Cu
MEL_Cu@project.name
head(MEL_Cu@meta.data)

An object of class Seurat 
33538 features across 787 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 13 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, data.1, data.2, data.3, data.4, data.5, data.6, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE215120_AM1_AAACCTGGTTGCTCCT-1,GSE215120,20298,3789,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,0.9754656,12,12
GSE215120_AM1_AAAGATGTCCAAATGC-1,GSE215120,5574,1721,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,6.0459275,12,12
GSE215120_AM1_AAAGTAGTCGGTGTTA-1,GSE215120,13432,2759,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,2.1515783,12,12
GSE215120_AM1_AAATGCCCAGAGCCAA-1,GSE215120,17143,2659,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,1.2249898,12,12
GSE215120_AM1_AAATGCCGTTTGGCGC-1,GSE215120,3603,1012,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,3.6081044,12,12
GSE215120_AM1_AAATGCCTCATGTCCC-1,GSE215120,14482,2882,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,1.0357685,12,12


An object of class Seurat 
33538 features across 427 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 9 layers present: counts.1, counts.2, counts.3, counts.4, data.1, data.2, data.3, data.4, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE215120_CM1_AAATGCCCATTACCTT-1,GSE215120,7596,1914,tumour,Cutaneous Melanoma,CM1,GSE215120_Cut_MEL_CM1,3.1200632,11,11
GSE215120_CM1_AACTCCCAGCCGATTT-1,GSE215120,4828,1341,tumour,Cutaneous Melanoma,CM1,GSE215120_Cut_MEL_CM1,0.9113505,11,11
GSE215120_CM1_AACTCCCTCGGCGCAT-1,GSE215120,7064,1684,tumour,Cutaneous Melanoma,CM1,GSE215120_Cut_MEL_CM1,2.5622877,11,11
GSE215120_CM1_AATCCAGTCAGGCCCA-1,GSE215120,10178,2223,tumour,Cutaneous Melanoma,CM1,GSE215120_Cut_MEL_CM1,2.0141482,11,11
GSE215120_CM1_ACACCAAGTCTTCTCG-1,GSE215120,5097,1378,tumour,Cutaneous Melanoma,CM1,GSE215120_Cut_MEL_CM1,0.3923877,11,11
GSE215120_CM1_ACACCCTCAATACGCT-1,GSE215120,7358,1667,tumour,Cutaneous Melanoma,CM1,GSE215120_Cut_MEL_CM1,2.133732,11,11


In [138]:
table(MEL_Ac$sample_type)
table(MEL_Ac$cancer_type)
table(MEL_Ac$patient_id)
table(MEL_Ac$sample_id)

table(MEL_Cu$sample_type)
table(MEL_Cu$cancer_type)
table(MEL_Cu$patient_id)
table(MEL_Cu$sample_id)


tumour 
   787 


Acral Melanoma 
           787 


AM1 AM2 AM3 AM4 AM5 AM6 
260  23 101   9 279 115 


GSE215120_Acral_MEL_AM1 GSE215120_Acral_MEL_AM2 GSE215120_Acral_MEL_AM3 
                    260                      23                     101 
GSE215120_Acral_MEL_AM4 GSE215120_Acral_MEL_AM5 GSE215120_Acral_MEL_AM6 
                      9                     279                     115 


LN metastasis        tumour 
          162           265 


Cutaneous Melanoma 
               427 


CM1 CM2 CM3 
295  32 100 


 GSE215120_Cut_MEL_CM1  GSE215120_Cut_MEL_CM2  GSE215120_Cut_MEL_CM3 
                   133                     32                    100 
GSE215120_MEL_mets_CM1 
                   162 

In [139]:
#split by cancer_type
MEL_Cu_Tu <- subset(MEL_Cu, subset = sample_type %in% c("tumour"))
MEL_Cu_LN <- subset(MEL_Cu, subset = sample_type %in% c("LN metastasis"))

#set site and sample_type_major metadata
MEL_Ac@meta.data$site <- "skin"
MEL_Cu_Tu@meta.data$site <- "skin"
MEL_Cu_LN@meta.data$site <- "lymph node"

MEL_Ac@meta.data$sample_type_major <- "primary tumour"
MEL_Cu_Tu@meta.data$sample_type_major <- "primary tumour"
MEL_Cu_LN@meta.data$sample_type_major <- "metastatic tumour"

#set subtype metadata
MEL_Ac@meta.data$cancer_subtype <- "Acral Melanoma"
MEL_Cu_Tu@meta.data$cancer_subtype <- "Melanoma"
MEL_Cu_LN@meta.data$cancer_subtype <- "Melanoma"

#Merge seurat objects back together
MEL <- merge(MEL_Ac, y = c(MEL_Cu_Tu, MEL_Cu_LN), project = "GSE215120")

In [140]:
#set integration_id metadata
MEL@meta.data$integration_id <- MEL@meta.data$sample_id

In [141]:
MEL
MEL@project.name
head(MEL@meta.data)

An object of class Seurat 
33538 features across 1214 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 23 layers present: counts.1.1, counts.1.2, counts.2.1, counts.2.2, counts.3.1, counts.3.2, counts.4.1, counts.4.3, counts.5.1, counts.6.1, data.1.1, data.2.1, data.3.1, data.4.1, data.5.1, data.6.1, scale.data.1, data.1.2, data.2.2, data.3.2, scale.data.2, data.4.3, scale.data.3

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE215120_AM1_AAACCTGGTTGCTCCT-1,GSE215120,20298,3789,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,0.9754656,12,12,skin,primary tumour,Acral Melanoma,GSE215120_Acral_MEL_AM1
GSE215120_AM1_AAAGATGTCCAAATGC-1,GSE215120,5574,1721,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,6.0459275,12,12,skin,primary tumour,Acral Melanoma,GSE215120_Acral_MEL_AM1
GSE215120_AM1_AAAGTAGTCGGTGTTA-1,GSE215120,13432,2759,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,2.1515783,12,12,skin,primary tumour,Acral Melanoma,GSE215120_Acral_MEL_AM1
GSE215120_AM1_AAATGCCCAGAGCCAA-1,GSE215120,17143,2659,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,1.2249898,12,12,skin,primary tumour,Acral Melanoma,GSE215120_Acral_MEL_AM1
GSE215120_AM1_AAATGCCGTTTGGCGC-1,GSE215120,3603,1012,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,3.6081044,12,12,skin,primary tumour,Acral Melanoma,GSE215120_Acral_MEL_AM1
GSE215120_AM1_AAATGCCTCATGTCCC-1,GSE215120,14482,2882,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,1.0357685,12,12,skin,primary tumour,Acral Melanoma,GSE215120_Acral_MEL_AM1


In [142]:
#exclude any samples with <100 cells
table(MEL$integration_id)
#exclude AM2, AM4, CM2
MEL <- subset(MEL, !(subset = integration_id %in% c("GSE215120_Acral_MEL_AM2","GSE215120_Acral_MEL_AM4","GSE215120_Cut_MEL_CM2")))
table(MEL$integration_id)


GSE215120_Acral_MEL_AM1 GSE215120_Acral_MEL_AM2 GSE215120_Acral_MEL_AM3 
                    260                      23                     101 
GSE215120_Acral_MEL_AM4 GSE215120_Acral_MEL_AM5 GSE215120_Acral_MEL_AM6 
                      9                     279                     115 
  GSE215120_Cut_MEL_CM1   GSE215120_Cut_MEL_CM2   GSE215120_Cut_MEL_CM3 
                    133                      32                     100 
 GSE215120_MEL_mets_CM1 
                    162 


GSE215120_Acral_MEL_AM1 GSE215120_Acral_MEL_AM3 GSE215120_Acral_MEL_AM5 
                    260                     101                     279 
GSE215120_Acral_MEL_AM6   GSE215120_Cut_MEL_CM1   GSE215120_Cut_MEL_CM3 
                    115                     133                     100 
 GSE215120_MEL_mets_CM1 
                    162 

In [143]:
#join layers and then split them by integration_id
Layers(MEL[["RNA"]])
#join layers
MEL[["RNA"]] <- JoinLayers(MEL[["RNA"]])
Layers(MEL[["RNA"]])
#split layers
MEL[["RNA"]] <- split(MEL[["RNA"]], f = MEL$integration_id)
Layers(MEL[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [144]:
#record number of cells
MEL
MEL@project.name
head(MEL@meta.data)
tail(MEL@meta.data)
table(MEL$integration_id)

An object of class Seurat 
33538 features across 1150 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 15 layers present: counts.GSE215120_Acral_MEL_AM1, counts.GSE215120_Acral_MEL_AM3, counts.GSE215120_Acral_MEL_AM5, counts.GSE215120_Acral_MEL_AM6, counts.GSE215120_Cut_MEL_CM1, counts.GSE215120_Cut_MEL_CM3, counts.GSE215120_MEL_mets_CM1, scale.data, data.GSE215120_Acral_MEL_AM1, data.GSE215120_Acral_MEL_AM3, data.GSE215120_Acral_MEL_AM5, data.GSE215120_Acral_MEL_AM6, data.GSE215120_Cut_MEL_CM1, data.GSE215120_Cut_MEL_CM3, data.GSE215120_MEL_mets_CM1

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE215120_AM1_AAACCTGGTTGCTCCT-1,GSE215120,20298,3789,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,0.9754656,12,12,skin,primary tumour,Acral Melanoma,GSE215120_Acral_MEL_AM1
GSE215120_AM1_AAAGATGTCCAAATGC-1,GSE215120,5574,1721,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,6.0459275,12,12,skin,primary tumour,Acral Melanoma,GSE215120_Acral_MEL_AM1
GSE215120_AM1_AAAGTAGTCGGTGTTA-1,GSE215120,13432,2759,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,2.1515783,12,12,skin,primary tumour,Acral Melanoma,GSE215120_Acral_MEL_AM1
GSE215120_AM1_AAATGCCCAGAGCCAA-1,GSE215120,17143,2659,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,1.2249898,12,12,skin,primary tumour,Acral Melanoma,GSE215120_Acral_MEL_AM1
GSE215120_AM1_AAATGCCGTTTGGCGC-1,GSE215120,3603,1012,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,3.6081044,12,12,skin,primary tumour,Acral Melanoma,GSE215120_Acral_MEL_AM1
GSE215120_AM1_AAATGCCTCATGTCCC-1,GSE215120,14482,2882,tumour,Acral Melanoma,AM1,GSE215120_Acral_MEL_AM1,1.0357685,12,12,skin,primary tumour,Acral Melanoma,GSE215120_Acral_MEL_AM1


Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE215120_CM1_mets_TTGGTTTAGTGCAACG-1,GSE215120,627,355,LN metastasis,Cutaneous Melanoma,CM1,GSE215120_MEL_mets_CM1,3.668262,11,11,lymph node,metastatic tumour,Melanoma,GSE215120_MEL_mets_CM1
GSE215120_CM1_mets_TTTAGTCAGTTCTCTT-1,GSE215120,2740,748,LN metastasis,Cutaneous Melanoma,CM1,GSE215120_MEL_mets_CM1,4.19708,11,11,lymph node,metastatic tumour,Melanoma,GSE215120_MEL_mets_CM1
GSE215120_CM1_mets_TTTCACACACCGGAAA-1,GSE215120,1174,529,LN metastasis,Cutaneous Melanoma,CM1,GSE215120_MEL_mets_CM1,22.146508,11,11,lymph node,metastatic tumour,Melanoma,GSE215120_MEL_mets_CM1
GSE215120_CM1_mets_TTTCATGTCCACTAGA-1,GSE215120,732,395,LN metastasis,Cutaneous Melanoma,CM1,GSE215120_MEL_mets_CM1,2.322404,11,11,lymph node,metastatic tumour,Melanoma,GSE215120_MEL_mets_CM1
GSE215120_CM1_mets_TTTGGAGGTTGAGAGC-1,GSE215120,4265,1376,LN metastasis,Cutaneous Melanoma,CM1,GSE215120_MEL_mets_CM1,4.220399,11,11,lymph node,metastatic tumour,Melanoma,GSE215120_MEL_mets_CM1
GSE215120_CM1_mets_TTTGGTTTCGAGGCAA-1,GSE215120,671,384,LN metastasis,Cutaneous Melanoma,CM1,GSE215120_MEL_mets_CM1,1.043219,11,11,lymph node,metastatic tumour,Melanoma,GSE215120_MEL_mets_CM1



GSE215120_Acral_MEL_AM1 GSE215120_Acral_MEL_AM3 GSE215120_Acral_MEL_AM5 
                    260                     101                     279 
GSE215120_Acral_MEL_AM6   GSE215120_Cut_MEL_CM1   GSE215120_Cut_MEL_CM3 
                    115                     133                     100 
 GSE215120_MEL_mets_CM1 
                    162 

In [145]:
#re-export seurat object ready for integration
saveRDS(MEL, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE215120_myeloid_int.RDS")

In [146]:
#remove all objects in R
rm(list = ls())

## PRJNA907381

In [3]:
MEL <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/PRJNA907381_myeloid.RDS")

In [4]:
MEL
MEL@project.name
head(MEL@meta.data)

An object of class Seurat 
36601 features across 2723 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 17 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
PRJNA907381_MEL022_iLN_AAAGAACCAGCGCGTT-1,PRJNA907381,17285,3704,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,2.100087,6,6
PRJNA907381_MEL022_iLN_AAAGGGCTCCATAGAC-1,PRJNA907381,42925,6544,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,3.93477,6,6
PRJNA907381_MEL022_iLN_AACAAAGCAAGTATAG-1,PRJNA907381,16549,3569,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,6.689226,6,6
PRJNA907381_MEL022_iLN_AACAAGACAGGATTCT-1,PRJNA907381,18108,3854,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,4.870775,6,6
PRJNA907381_MEL022_iLN_AACCACATCTTTCCAA-1,PRJNA907381,31754,5097,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,3.231089,6,6
PRJNA907381_MEL022_iLN_AACCATGAGAAGTCTA-1,PRJNA907381,26158,4705,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,4.908632,6,6


In [5]:
table(MEL$sample_type)
table(MEL$cancer_type)
table(MEL$patient_id)
table(MEL$sample_id)


      LN mets uninvolved LN 
         1536          1187 


            Healthy Metastatic Melanoma 
               1187                1536 


MEL002 MEL009 MEL014 MEL018 MEL022 
   614    404    743    785    177 


PRJNA907381_MEL002_iLN PRJNA907381_MEL002_uLN PRJNA907381_MEL009_iLN 
                   164                    450                    404 
PRJNA907381_MEL014_iLN PRJNA907381_MEL014_uLN PRJNA907381_MEL018_iLN 
                   422                    321                    369 
PRJNA907381_MEL018_uLN PRJNA907381_MEL022_iLN 
                   416                    177 

In [6]:
#split by cancer_type
MEL_Tu <- subset(MEL, subset = cancer_type %in% c("Metastatic Melanoma"))
MEL_H <- subset(MEL, subset = cancer_type %in% c("Healthy"))

#set site and sample_type_major metadata
MEL_Tu@meta.data$site <- "lymph node"
MEL_H@meta.data$site <- "lymph node"

MEL_Tu@meta.data$sample_type_major <- "metastatic tumour"
MEL_H@meta.data$sample_type_major <- "healthy"

#set subtype metadata
MEL_Tu@meta.data$cancer_subtype <- "Melanoma"
MEL_H@meta.data$cancer_subtype <- "NA"

#Merge seurat objects back together
MEL <- merge(MEL_Tu, y = c(MEL_H), project = "PRJNA907381")

In [7]:
#set integration_id metadata
MEL@meta.data$integration_id <- MEL@meta.data$sample_id

In [9]:
MEL
MEL@project.name
head(MEL@meta.data)
tail(MEL@meta.data)

An object of class Seurat 
36601 features across 2723 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 18 layers present: counts.1.1, counts.3.1, counts.5.1, counts.6.1, counts.8.1, data.1.1, data.3.1, data.5.1, data.6.1, data.8.1, scale.data.1, counts.2.2, counts.4.2, counts.7.2, data.2.2, data.4.2, data.7.2, scale.data.2

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
PRJNA907381_MEL022_iLN_AAAGAACCAGCGCGTT-1,PRJNA907381,17285,3704,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,2.100087,6,6,lymph node,metastatic tumour,Melanoma,PRJNA907381_MEL022_iLN
PRJNA907381_MEL022_iLN_AAAGGGCTCCATAGAC-1,PRJNA907381,42925,6544,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,3.93477,6,6,lymph node,metastatic tumour,Melanoma,PRJNA907381_MEL022_iLN
PRJNA907381_MEL022_iLN_AACAAAGCAAGTATAG-1,PRJNA907381,16549,3569,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,6.689226,6,6,lymph node,metastatic tumour,Melanoma,PRJNA907381_MEL022_iLN
PRJNA907381_MEL022_iLN_AACAAGACAGGATTCT-1,PRJNA907381,18108,3854,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,4.870775,6,6,lymph node,metastatic tumour,Melanoma,PRJNA907381_MEL022_iLN
PRJNA907381_MEL022_iLN_AACCACATCTTTCCAA-1,PRJNA907381,31754,5097,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,3.231089,6,6,lymph node,metastatic tumour,Melanoma,PRJNA907381_MEL022_iLN
PRJNA907381_MEL022_iLN_AACCATGAGAAGTCTA-1,PRJNA907381,26158,4705,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,4.908632,6,6,lymph node,metastatic tumour,Melanoma,PRJNA907381_MEL022_iLN


Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
PRJNA907381_MEL002_uLN_TTTACCACAAATCAAG-1,PRJNA907381,34326,5679,uninvolved LN,Healthy,MEL002,PRJNA907381_MEL002_uLN,3.629902,6,6,lymph node,healthy,,PRJNA907381_MEL002_uLN
PRJNA907381_MEL002_uLN_TTTATGCGTTGCCGCA-1,PRJNA907381,20339,4758,uninvolved LN,Healthy,MEL002,PRJNA907381_MEL002_uLN,6.544078,6,6,lymph node,healthy,,PRJNA907381_MEL002_uLN
PRJNA907381_MEL002_uLN_TTTCATGTCGACGACC-1,PRJNA907381,30296,5207,uninvolved LN,Healthy,MEL002,PRJNA907381_MEL002_uLN,3.383285,6,6,lymph node,healthy,,PRJNA907381_MEL002_uLN
PRJNA907381_MEL002_uLN_TTTCCTCGTTCGAACT-1,PRJNA907381,50410,5915,uninvolved LN,Healthy,MEL002,PRJNA907381_MEL002_uLN,4.558619,16,16,lymph node,healthy,,PRJNA907381_MEL002_uLN
PRJNA907381_MEL002_uLN_TTTGGAGAGTTCTCTT-1,PRJNA907381,13123,2408,uninvolved LN,Healthy,MEL002,PRJNA907381_MEL002_uLN,15.133735,6,6,lymph node,healthy,,PRJNA907381_MEL002_uLN
PRJNA907381_MEL002_uLN_TTTGGTTAGTGGCGAT-1,PRJNA907381,25688,5435,uninvolved LN,Healthy,MEL002,PRJNA907381_MEL002_uLN,7.4237,6,6,lymph node,healthy,,PRJNA907381_MEL002_uLN


In [11]:
#exclude any samples with <100 cells
table(MEL$integration_id)
#none to exclude


PRJNA907381_MEL002_iLN PRJNA907381_MEL002_uLN PRJNA907381_MEL009_iLN 
                   164                    450                    404 
PRJNA907381_MEL014_iLN PRJNA907381_MEL014_uLN PRJNA907381_MEL018_iLN 
                   422                    321                    369 
PRJNA907381_MEL018_uLN PRJNA907381_MEL022_iLN 
                   416                    177 

In [12]:
#join layers and then split them by integration_id
Layers(MEL[["RNA"]])
#join layers
MEL[["RNA"]] <- JoinLayers(MEL[["RNA"]])
Layers(MEL[["RNA"]])
#split layers
MEL[["RNA"]] <- split(MEL[["RNA"]], f = MEL$integration_id)
Layers(MEL[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [13]:
#record number of cells
MEL
MEL@project.name
head(MEL@meta.data)
tail(MEL@meta.data)
table(MEL$integration_id)

An object of class Seurat 
36601 features across 2723 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 17 layers present: counts.PRJNA907381_MEL022_iLN, counts.PRJNA907381_MEL018_iLN, counts.PRJNA907381_MEL014_iLN, counts.PRJNA907381_MEL009_iLN, counts.PRJNA907381_MEL002_iLN, counts.PRJNA907381_MEL018_uLN, counts.PRJNA907381_MEL014_uLN, counts.PRJNA907381_MEL002_uLN, scale.data, data.PRJNA907381_MEL022_iLN, data.PRJNA907381_MEL018_iLN, data.PRJNA907381_MEL014_iLN, data.PRJNA907381_MEL009_iLN, data.PRJNA907381_MEL002_iLN, data.PRJNA907381_MEL018_uLN, data.PRJNA907381_MEL014_uLN, data.PRJNA907381_MEL002_uLN

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
PRJNA907381_MEL022_iLN_AAAGAACCAGCGCGTT-1,PRJNA907381,17285,3704,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,2.100087,6,6,lymph node,metastatic tumour,Melanoma,PRJNA907381_MEL022_iLN
PRJNA907381_MEL022_iLN_AAAGGGCTCCATAGAC-1,PRJNA907381,42925,6544,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,3.93477,6,6,lymph node,metastatic tumour,Melanoma,PRJNA907381_MEL022_iLN
PRJNA907381_MEL022_iLN_AACAAAGCAAGTATAG-1,PRJNA907381,16549,3569,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,6.689226,6,6,lymph node,metastatic tumour,Melanoma,PRJNA907381_MEL022_iLN
PRJNA907381_MEL022_iLN_AACAAGACAGGATTCT-1,PRJNA907381,18108,3854,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,4.870775,6,6,lymph node,metastatic tumour,Melanoma,PRJNA907381_MEL022_iLN
PRJNA907381_MEL022_iLN_AACCACATCTTTCCAA-1,PRJNA907381,31754,5097,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,3.231089,6,6,lymph node,metastatic tumour,Melanoma,PRJNA907381_MEL022_iLN
PRJNA907381_MEL022_iLN_AACCATGAGAAGTCTA-1,PRJNA907381,26158,4705,LN mets,Metastatic Melanoma,MEL022,PRJNA907381_MEL022_iLN,4.908632,6,6,lymph node,metastatic tumour,Melanoma,PRJNA907381_MEL022_iLN


Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
PRJNA907381_MEL002_uLN_TTTACCACAAATCAAG-1,PRJNA907381,34326,5679,uninvolved LN,Healthy,MEL002,PRJNA907381_MEL002_uLN,3.629902,6,6,lymph node,healthy,,PRJNA907381_MEL002_uLN
PRJNA907381_MEL002_uLN_TTTATGCGTTGCCGCA-1,PRJNA907381,20339,4758,uninvolved LN,Healthy,MEL002,PRJNA907381_MEL002_uLN,6.544078,6,6,lymph node,healthy,,PRJNA907381_MEL002_uLN
PRJNA907381_MEL002_uLN_TTTCATGTCGACGACC-1,PRJNA907381,30296,5207,uninvolved LN,Healthy,MEL002,PRJNA907381_MEL002_uLN,3.383285,6,6,lymph node,healthy,,PRJNA907381_MEL002_uLN
PRJNA907381_MEL002_uLN_TTTCCTCGTTCGAACT-1,PRJNA907381,50410,5915,uninvolved LN,Healthy,MEL002,PRJNA907381_MEL002_uLN,4.558619,16,16,lymph node,healthy,,PRJNA907381_MEL002_uLN
PRJNA907381_MEL002_uLN_TTTGGAGAGTTCTCTT-1,PRJNA907381,13123,2408,uninvolved LN,Healthy,MEL002,PRJNA907381_MEL002_uLN,15.133735,6,6,lymph node,healthy,,PRJNA907381_MEL002_uLN
PRJNA907381_MEL002_uLN_TTTGGTTAGTGGCGAT-1,PRJNA907381,25688,5435,uninvolved LN,Healthy,MEL002,PRJNA907381_MEL002_uLN,7.4237,6,6,lymph node,healthy,,PRJNA907381_MEL002_uLN



PRJNA907381_MEL002_iLN PRJNA907381_MEL002_uLN PRJNA907381_MEL009_iLN 
                   164                    450                    404 
PRJNA907381_MEL014_iLN PRJNA907381_MEL014_uLN PRJNA907381_MEL018_iLN 
                   422                    321                    369 
PRJNA907381_MEL018_uLN PRJNA907381_MEL022_iLN 
                   416                    177 

In [14]:
#re-export seurat object ready for integration
saveRDS(MEL, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/PRJNA907381_myeloid_int.RDS")

In [15]:
#remove all objects in R
rm(list = ls())

## GSE161529

In [16]:
BRE <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE161529_myeloid.RDS")

In [17]:
BRE
BRE@project.name
head(BRE@meta.data)

An object of class Seurat 
33538 features across 24082 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 117 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, counts.14, counts.15, counts.16, counts.17, counts.18, counts.19, counts.20, counts.21, counts.22, counts.23, counts.24, counts.25, counts.26, counts.27, counts.28, counts.29, counts.30, counts.31, counts.32, counts.33, counts.34, counts.35, counts.36, counts.37, counts.38, counts.39, counts.40, counts.41, counts.42, counts.43, counts.44, counts.45, counts.46, counts.47, counts.48, counts.49, counts.50, counts.51, counts.52, counts.53, counts.54, counts.55, counts.56, counts.57, counts.58, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, data.14, data.15, data.16, data.17, data.18, data.19, data.20, data.21, data.22, data.23, data.24, data.

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE161529_B10023_AAAGCAATCCAGTATG-1,GSE161529,576,348,pre-neoplastic,BRCA1 pre-neoplastic,23,GSE161529_pre-neo_B10023,5.555556,1,1
GSE161529_B10023_AACTCCCCACAAGACG-1,GSE161529,1279,534,pre-neoplastic,BRCA1 pre-neoplastic,23,GSE161529_pre-neo_B10023,2.345582,1,1
GSE161529_B10023_AAGGTTCTCCTTGCCA-1,GSE161529,3314,1095,pre-neoplastic,BRCA1 pre-neoplastic,23,GSE161529_pre-neo_B10023,4.405552,1,1
GSE161529_B10023_ACATACGGTGGCGAAT-1,GSE161529,2616,922,pre-neoplastic,BRCA1 pre-neoplastic,23,GSE161529_pre-neo_B10023,2.561162,1,1
GSE161529_B10023_ACCCACTCATATGGTC-1,GSE161529,3568,1209,pre-neoplastic,BRCA1 pre-neoplastic,23,GSE161529_pre-neo_B10023,3.559417,1,1
GSE161529_B10023_ACTGCTCTCATGCAAC-1,GSE161529,1140,463,pre-neoplastic,BRCA1 pre-neoplastic,23,GSE161529_pre-neo_B10023,8.421053,1,1


In [18]:
table(BRE$sample_type)
table(BRE$cancer_type)
table(BRE$patient_id)
table(BRE$sample_id)


Healthy breast        LN mets pre-neoplastic         tumour 
          1136            979            287          21680 


      BRCA1 pre-neoplastic                 BRCA1 TNBC 
                       287                       4946 
          ER breast cancer      ER breast cancer mets 
                      9298                        976 
                   Healthy         HER2 breast cancer 
                      1136                       4076 
     male ER breast cancer male ER breast cancer mets 
                      1673                          3 
                      TNBC 
                      1687 


0001 0019 0021 0023 0025 0029 0031 0032 0033 0040 0042 0043 0056 0064 0068 0069 
 452  122    6  117  578  416  905  100   73  637 2116 1454   74  246   10   60 
0090 0092 0095 0106 0114 0125 0126 0131 0135 0151 0161 0163 0167 0169 0173 0176 
 129   50   44  285  494  201  474   49  684  290  161  475  681  433 1610  979 
0177 0178 0230 0233 0275 0288 0308 0319 0337 0342 0360 0372 0554 0894 4031 
3084 1666   22  203    4    2  377  495 1594   57  202  118 1754   40   59 


      GSE161529_BRCA1_TNBC_0131       GSE161529_BRCA1_TNBC_0177 
                             49                            3084 
      GSE161529_BRCA1_TNBC_0554       GSE161529_BRCA1_TNBC_4031 
                           1754                              59 
     GSE161529_ER_breast_ER0001      GSE161529_ER_breast_ER0025 
                            452                             578 
  GSE161529_ER_breast_ER0029_7C   GSE161529_ER_breast_ER0029_9C 
                            248                             168 
     GSE161529_ER_breast_ER0032      GSE161529_ER_breast_ER0040 
                            100                             350 
     GSE161529_ER_breast_ER0042      GSE161529_ER_breast_ER0043 
                           2116                            1448 
     GSE161529_ER_breast_ER0056      GSE161529_ER_breast_ER0064 
                             20                             154 
     GSE161529_ER_breast_ER0114      GSE161529_ER_breast_ER0125 
                        

In [19]:
table(BRE$cancer_type)


      BRCA1 pre-neoplastic                 BRCA1 TNBC 
                       287                       4946 
          ER breast cancer      ER breast cancer mets 
                      9298                        976 
                   Healthy         HER2 breast cancer 
                      1136                       4076 
     male ER breast cancer male ER breast cancer mets 
                      1673                          3 
                      TNBC 
                      1687 

In [20]:
#set site metadata, split by sample_type
BRE_H <- subset(BRE, subset = sample_type %in% c("Healthy breast"))
BRE_LN <- subset(BRE, subset = sample_type %in% c("LN mets"))
BRE_pre <- subset(BRE, subset = sample_type %in% c("pre-neoplastic"))
BRE_T <- subset(BRE, subset = sample_type %in% c("tumour"))

BRE_H@meta.data$site <- "breast"
BRE_LN@meta.data$site <- "lymph node"
BRE_pre@meta.data$site <- "breast"
BRE_T@meta.data$site <- "breast"

#set sample_type_major metadata
BRE_H@meta.data$sample_type_major <- "healthy"
BRE_LN@meta.data$sample_type_major <- "metastatic tumour"
BRE_pre@meta.data$sample_type_major <- "pre-neoplastic BRCA1"
BRE_T@meta.data$sample_type_major <- "primary tumour"

#merge back together 
BRE <- merge(BRE_H, y = c(BRE_LN, BRE_pre, BRE_T), project = "GSE161529")

#set cancer_subtype metadata, split by cancer_type 
BRE_pre <- subset(BRE, subset = cancer_type %in% c("BRCA1 pre-neoplastic"))
BRE_B_TNBC <- subset(BRE, subset = cancer_type %in% c("BRCA1 TNBC"))
BRE_ER <- subset(BRE, subset = cancer_type %in% c("ER breast cancer"))
BRE_ER_mets <- subset(BRE, subset = cancer_type %in% c("ER breast cancer mets"))
BRE_H <- subset(BRE, subset = cancer_type %in% c("Healthy"))
BRE_HER2 <- subset(BRE, subset = cancer_type %in% c("HER2 breast cancer"))
BRE_m_ER <- subset(BRE, subset = cancer_type %in% c("male ER breast cancer"))
BRE_m_ER_mets <- subset(BRE, subset = cancer_type %in% c("male ER breast cancer mets"))
BRE_TNBC <- subset(BRE, subset = cancer_type %in% c("TNBC"))

BRE_pre@meta.data$cancer_subtype <- "NA"
BRE_B_TNBC@meta.data$cancer_subtype <- "BRCA1 TNBC" 
BRE_ER@meta.data$cancer_subtype <- "ER Breast Cancer" 
BRE_ER_mets@meta.data$cancer_subtype <- "ER Breast Cancer" 
BRE_H@meta.data$cancer_subtype <- "NA" 
BRE_HER2@meta.data$cancer_subtype <- "HER2 Breast Cancer" 
BRE_m_ER@meta.data$cancer_subtype <- "male ER Breast Cancer" 
BRE_m_ER_mets@meta.data$cancer_subtype <- "male ER Breast Cancer" 
BRE_TNBC@meta.data$cancer_subtype <- "TNBC" 

#merge back together 
BRE <- merge(BRE_pre, y = c(BRE_B_TNBC, BRE_ER, BRE_ER_mets, BRE_H, BRE_HER2, BRE_m_ER, BRE_m_ER_mets, BRE_TNBC), project = "GSE161529")

In [21]:
#set integration_id metadata
BRE@meta.data$integration_id <- BRE@meta.data$sample_id

In [22]:
BRE
BRE@project.name
head(BRE@meta.data)
tail(BRE@meta.data)

An object of class Seurat 
33538 features across 24082 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 125 layers present: counts.1.3.1, counts.2.3.1, counts.3.3.1, counts.4.3.1, data.1.3.1, data.2.3.1, data.3.3.1, data.4.3.1, scale.data.3.1, counts.55.4.2, counts.56.4.2, counts.57.4.2, counts.58.4.2, data.55.4.2, data.56.4.2, data.57.4.2, data.58.4.2, scale.data.4.2, counts.5.4.3, counts.6.4.3, counts.7.4.3, counts.8.4.3, counts.9.4.3, counts.10.4.3, counts.12.4.3, counts.13.4.3, counts.15.4.3, counts.17.4.3, counts.19.4.3, counts.20.4.3, counts.21.4.3, counts.22.4.3, counts.23.4.3, counts.25.4.3, counts.27.4.3, counts.28.4.3, data.5.4.3, data.6.4.3, data.7.4.3, data.8.4.3, data.9.4.3, data.10.4.3, data.12.4.3, data.13.4.3, data.15.4.3, data.17.4.3, data.19.4.3, data.20.4.3, data.21.4.3, data.22.4.3, data.23.4.3, data.25.4.3, data.27.4.3, data.28.4.3, scale.data.4.3, counts.11.2.4, counts.14.2.4, counts.16.2.4, counts.18.2.4, counts.24.2.4, counts.2

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE161529_B10023_AAAGCAATCCAGTATG-1,GSE161529,576,348,pre-neoplastic,BRCA1 pre-neoplastic,23,GSE161529_pre-neo_B10023,5.555556,1,1,breast,pre-neoplastic BRCA1,,GSE161529_pre-neo_B10023
GSE161529_B10023_AACTCCCCACAAGACG-1,GSE161529,1279,534,pre-neoplastic,BRCA1 pre-neoplastic,23,GSE161529_pre-neo_B10023,2.345582,1,1,breast,pre-neoplastic BRCA1,,GSE161529_pre-neo_B10023
GSE161529_B10023_AAGGTTCTCCTTGCCA-1,GSE161529,3314,1095,pre-neoplastic,BRCA1 pre-neoplastic,23,GSE161529_pre-neo_B10023,4.405552,1,1,breast,pre-neoplastic BRCA1,,GSE161529_pre-neo_B10023
GSE161529_B10023_ACATACGGTGGCGAAT-1,GSE161529,2616,922,pre-neoplastic,BRCA1 pre-neoplastic,23,GSE161529_pre-neo_B10023,2.561162,1,1,breast,pre-neoplastic BRCA1,,GSE161529_pre-neo_B10023
GSE161529_B10023_ACCCACTCATATGGTC-1,GSE161529,3568,1209,pre-neoplastic,BRCA1 pre-neoplastic,23,GSE161529_pre-neo_B10023,3.559417,1,1,breast,pre-neoplastic BRCA1,,GSE161529_pre-neo_B10023
GSE161529_B10023_ACTGCTCTCATGCAAC-1,GSE161529,1140,463,pre-neoplastic,BRCA1 pre-neoplastic,23,GSE161529_pre-neo_B10023,8.421053,1,1,breast,pre-neoplastic BRCA1,,GSE161529_pre-neo_B10023


Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE161529_TN0135_TTTCCTCTCGAGAGCA-1,GSE161529,3019,1094,tumour,TNBC,135,GSE161529_TNBC_0135,2.51739,1,1,breast,primary tumour,TNBC,GSE161529_TNBC_0135
GSE161529_TN0135_TTTGCGCCACGGTTTA-1,GSE161529,2429,906,tumour,TNBC,135,GSE161529_TNBC_0135,3.252367,1,1,breast,primary tumour,TNBC,GSE161529_TNBC_0135
GSE161529_TN0135_TTTGCGCGTAAACGCG-1,GSE161529,9095,2480,tumour,TNBC,135,GSE161529_TNBC_0135,5.17867,1,1,breast,primary tumour,TNBC,GSE161529_TNBC_0135
GSE161529_TN0135_TTTGGTTTCGACCAGC-1,GSE161529,3207,1341,tumour,TNBC,135,GSE161529_TNBC_0135,4.70845,1,1,breast,primary tumour,TNBC,GSE161529_TNBC_0135
GSE161529_TN0135_TTTGTCAAGTCCGTAT-1,GSE161529,4551,1365,tumour,TNBC,135,GSE161529_TNBC_0135,2.966381,1,1,breast,primary tumour,TNBC,GSE161529_TNBC_0135
GSE161529_TN0135_TTTGTCACACAGGAGT-1,GSE161529,5355,1455,tumour,TNBC,135,GSE161529_TNBC_0135,2.745098,1,1,breast,primary tumour,TNBC,GSE161529_TNBC_0135


In [24]:
#exclude any samples with <100 cells
table(BRE$integration_id)
#exclude 21 samples
BRE <- subset(BRE, !(subset = integration_id %in% c("GSE161529_BRCA1_TNBC_0131","GSE161529_BRCA1_TNBC_4031","GSE161529_ER_breast_ER0056","GSE161529_ER_breast_mets_ER0043","GSE161529_ER_breast_mets_ER0056","GSE161529_ER_breast_mets_ER0064","GSE161529_Healthy_breast_0021","GSE161529_Healthy_breast_0023","GSE161529_Healthy_breast_0064","GSE161529_Healthy_breast_0092","GSE161529_Healthy_breast_0095","GSE161529_Healthy_breast_0230","GSE161529_Healthy_breast_0275","GSE161529_Healthy_breast_0288","GSE161529_Healthy_breast_0342","GSE161529_HER2_breast_0069","GSE161529_mER_breast_0068","GSE161529_mER_breast_mets_0068","GSE161529_pre-neo_B10023","GSE161529_pre-neo_B10033","GSE161529_pre-neo_B10894")))
table(BRE$integration_id)


      GSE161529_BRCA1_TNBC_0131       GSE161529_BRCA1_TNBC_0177 
                             49                            3084 
      GSE161529_BRCA1_TNBC_0554       GSE161529_BRCA1_TNBC_4031 
                           1754                              59 
     GSE161529_ER_breast_ER0001      GSE161529_ER_breast_ER0025 
                            452                             578 
  GSE161529_ER_breast_ER0029_7C   GSE161529_ER_breast_ER0029_9C 
                            248                             168 
     GSE161529_ER_breast_ER0032      GSE161529_ER_breast_ER0040 
                            100                             350 
     GSE161529_ER_breast_ER0042      GSE161529_ER_breast_ER0043 
                           2116                            1448 
     GSE161529_ER_breast_ER0056      GSE161529_ER_breast_ER0064 
                             20                             154 
     GSE161529_ER_breast_ER0114      GSE161529_ER_breast_ER0125 
                        


      GSE161529_BRCA1_TNBC_0177       GSE161529_BRCA1_TNBC_0554 
                           3084                            1754 
     GSE161529_ER_breast_ER0001      GSE161529_ER_breast_ER0025 
                            452                             578 
  GSE161529_ER_breast_ER0029_7C   GSE161529_ER_breast_ER0029_9C 
                            248                             168 
     GSE161529_ER_breast_ER0032      GSE161529_ER_breast_ER0040 
                            100                             350 
     GSE161529_ER_breast_ER0042      GSE161529_ER_breast_ER0043 
                           2116                            1448 
     GSE161529_ER_breast_ER0064      GSE161529_ER_breast_ER0114 
                            154                             250 
     GSE161529_ER_breast_ER0125      GSE161529_ER_breast_ER0151 
                            201                             290 
     GSE161529_ER_breast_ER0163      GSE161529_ER_breast_ER0167 
                        

In [25]:
#check all categories still present
table(BRE$sample_type)
table(BRE$cancer_type)
table(BRE$patient_id)
table(BRE$sample_id)


Healthy breast        LN mets pre-neoplastic         tumour 
           876            827            129          21485 


 BRCA1 pre-neoplastic            BRCA1 TNBC      ER breast cancer 
                  129                  4838                  9278 
ER breast cancer mets               Healthy    HER2 breast cancer 
                  827                   876                  4016 
male ER breast cancer                  TNBC 
                 1666                  1687 


0001 0019 0025 0029 0031 0032 0040 0042 0043 0064 0090 0106 0114 0125 0126 0135 
 452  122  578  416  905  100  637 2116 1448  154  129  285  494  201  474  684 
0151 0161 0163 0167 0169 0173 0176 0177 0178 0233 0308 0319 0337 0360 0372 0554 
 290  161  475  681  433 1610  979 3084 1666  203  377  495 1594  202  118 1754 


      GSE161529_BRCA1_TNBC_0177       GSE161529_BRCA1_TNBC_0554 
                           3084                            1754 
     GSE161529_ER_breast_ER0001      GSE161529_ER_breast_ER0025 
                            452                             578 
  GSE161529_ER_breast_ER0029_7C   GSE161529_ER_breast_ER0029_9C 
                            248                             168 
     GSE161529_ER_breast_ER0032      GSE161529_ER_breast_ER0040 
                            100                             350 
     GSE161529_ER_breast_ER0042      GSE161529_ER_breast_ER0043 
                           2116                            1448 
     GSE161529_ER_breast_ER0064      GSE161529_ER_breast_ER0114 
                            154                             250 
     GSE161529_ER_breast_ER0125      GSE161529_ER_breast_ER0151 
                            201                             290 
     GSE161529_ER_breast_ER0163      GSE161529_ER_breast_ER0167 
                        

In [26]:
#as only one pre-neo BRCA1 sample left decided to exclude category, same with male ER Breast Cancer
BRE <- subset(BRE, !(subset = integration_id %in% c("GSE161529_pre-neo_B10090","GSE161529_mER_breast_0178")))
table(BRE$integration_id)


      GSE161529_BRCA1_TNBC_0177       GSE161529_BRCA1_TNBC_0554 
                           3084                            1754 
     GSE161529_ER_breast_ER0001      GSE161529_ER_breast_ER0025 
                            452                             578 
  GSE161529_ER_breast_ER0029_7C   GSE161529_ER_breast_ER0029_9C 
                            248                             168 
     GSE161529_ER_breast_ER0032      GSE161529_ER_breast_ER0040 
                            100                             350 
     GSE161529_ER_breast_ER0042      GSE161529_ER_breast_ER0043 
                           2116                            1448 
     GSE161529_ER_breast_ER0064      GSE161529_ER_breast_ER0114 
                            154                             250 
     GSE161529_ER_breast_ER0125      GSE161529_ER_breast_ER0151 
                            201                             290 
     GSE161529_ER_breast_ER0163      GSE161529_ER_breast_ER0167 
                        

In [28]:
#check what categories still present
table(BRE$sample_type)
table(BRE$cancer_type)
table(BRE$patient_id)
table(BRE$sample_id)
table(BRE$site)
table(BRE$cancer_subtype)
table(BRE$sample_type_major)


Healthy breast        LN mets         tumour 
           876            827          19819 


           BRCA1 TNBC      ER breast cancer ER breast cancer mets 
                 4838                  9278                   827 
              Healthy    HER2 breast cancer                  TNBC 
                  876                  4016                  1687 


0001 0019 0025 0029 0031 0032 0040 0042 0043 0064 0106 0114 0125 0126 0135 0151 
 452  122  578  416  905  100  637 2116 1448  154  285  494  201  474  684  290 
0161 0163 0167 0169 0173 0176 0177 0233 0308 0319 0337 0360 0372 0554 
 161  475  681  433 1610  979 3084  203  377  495 1594  202  118 1754 


      GSE161529_BRCA1_TNBC_0177       GSE161529_BRCA1_TNBC_0554 
                           3084                            1754 
     GSE161529_ER_breast_ER0001      GSE161529_ER_breast_ER0025 
                            452                             578 
  GSE161529_ER_breast_ER0029_7C   GSE161529_ER_breast_ER0029_9C 
                            248                             168 
     GSE161529_ER_breast_ER0032      GSE161529_ER_breast_ER0040 
                            100                             350 
     GSE161529_ER_breast_ER0042      GSE161529_ER_breast_ER0043 
                           2116                            1448 
     GSE161529_ER_breast_ER0064      GSE161529_ER_breast_ER0114 
                            154                             250 
     GSE161529_ER_breast_ER0125      GSE161529_ER_breast_ER0151 
                            201                             290 
     GSE161529_ER_breast_ER0163      GSE161529_ER_breast_ER0167 
                        


    breast lymph node 
     20695        827 


        BRCA1 TNBC   ER Breast Cancer HER2 Breast Cancer                 NA 
              4838              10105               4016                876 
              TNBC 
              1687 


          healthy metastatic tumour    primary tumour 
              876               827             19819 

In [31]:
#realised two samples are not biologically distinct: GSE161529_ER_breast_ER0029_7C and  GSE161529_ER_breast_ER0029_9C
#need to ammend integration_id so they have the same

BRE_29 <- subset(BRE, subset = integration_id %in% c("GSE161529_ER_breast_ER0029_7C","GSE161529_ER_breast_ER0029_9C"))
BRE_else <- subset(BRE, !(subset = integration_id %in% c("GSE161529_ER_breast_ER0029_7C","GSE161529_ER_breast_ER0029_9C")))

BRE_29@meta.data$integration_id <- "GSE161529_ER_breast_ER0029"

In [32]:
BRE <- merge(BRE_29, y = c(BRE_else), project = "GSE161529")

In [34]:
BRE
table(BRE$integration_id)

An object of class Seurat 
33538 features across 21522 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 72 layers present: counts.GSE161529_ER_breast_ER0029_7C.1, counts.GSE161529_ER_breast_ER0029_9C.1, scale.data.1, data.GSE161529_ER_breast_ER0029_7C.1, data.GSE161529_ER_breast_ER0029_9C.1, counts.GSE161529_BRCA1_TNBC_0177.2, counts.GSE161529_BRCA1_TNBC_0554.2, counts.GSE161529_ER_breast_ER0001.2, counts.GSE161529_ER_breast_ER0025.2, counts.GSE161529_ER_breast_ER0032.2, counts.GSE161529_ER_breast_ER0040.2, counts.GSE161529_ER_breast_ER0042.2, counts.GSE161529_ER_breast_ER0043.2, counts.GSE161529_ER_breast_ER0064.2, counts.GSE161529_ER_breast_ER0114.2, counts.GSE161529_ER_breast_ER0125.2, counts.GSE161529_ER_breast_ER0151.2, counts.GSE161529_ER_breast_ER0163.2, counts.GSE161529_ER_breast_ER0167.2, counts.GSE161529_ER_breast_ER0173.2, counts.GSE161529_ER_breast_ER0319.2, counts.GSE161529_ER_breast_ER0360.2, counts.GSE161529_ER_breast_mets_ER0040.2, cou


      GSE161529_BRCA1_TNBC_0177       GSE161529_BRCA1_TNBC_0554 
                           3084                            1754 
     GSE161529_ER_breast_ER0001      GSE161529_ER_breast_ER0025 
                            452                             578 
     GSE161529_ER_breast_ER0029      GSE161529_ER_breast_ER0032 
                            416                             100 
     GSE161529_ER_breast_ER0040      GSE161529_ER_breast_ER0042 
                            350                            2116 
     GSE161529_ER_breast_ER0043      GSE161529_ER_breast_ER0064 
                           1448                             154 
     GSE161529_ER_breast_ER0114      GSE161529_ER_breast_ER0125 
                            250                             201 
     GSE161529_ER_breast_ER0151      GSE161529_ER_breast_ER0163 
                            290                             475 
     GSE161529_ER_breast_ER0167      GSE161529_ER_breast_ER0173 
                        

In [35]:
#join layers and then split them by integration_id
Layers(BRE[["RNA"]])
#join layers
BRE[["RNA"]] <- JoinLayers(BRE[["RNA"]])
Layers(BRE[["RNA"]])
#split layers
BRE[["RNA"]] <- split(BRE[["RNA"]], f = BRE$integration_id)
Layers(BRE[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [36]:
#record number of cells
BRE
BRE@project.name
head(BRE@meta.data)
tail(BRE@meta.data)
table(BRE$integration_id)

An object of class Seurat 
33538 features across 21522 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 69 layers present: counts.GSE161529_ER_breast_ER0029, counts.GSE161529_BRCA1_TNBC_0177, counts.GSE161529_BRCA1_TNBC_0554, counts.GSE161529_ER_breast_ER0001, counts.GSE161529_ER_breast_ER0025, counts.GSE161529_ER_breast_ER0032, counts.GSE161529_ER_breast_ER0040, counts.GSE161529_ER_breast_ER0042, counts.GSE161529_ER_breast_ER0043, counts.GSE161529_ER_breast_ER0064, counts.GSE161529_ER_breast_ER0114, counts.GSE161529_ER_breast_ER0125, counts.GSE161529_ER_breast_ER0151, counts.GSE161529_ER_breast_ER0163, counts.GSE161529_ER_breast_ER0167, counts.GSE161529_ER_breast_ER0173, counts.GSE161529_ER_breast_ER0319, counts.GSE161529_ER_breast_ER0360, counts.GSE161529_ER_breast_mets_ER0040, counts.GSE161529_ER_breast_mets_ER0167, counts.GSE161529_ER_breast_mets_ER0173, counts.GSE161529_Healthy_breast_0019, counts.GSE161529_Healthy_breast_0169, counts.GSE161529_H

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE161529_ER0029_7C_AAAGTAGAGGAGTTGC-1,GSE161529,1592,723,tumour,ER breast cancer,29,GSE161529_ER_breast_ER0029_7C,4.899497,1,1,breast,primary tumour,ER Breast Cancer,GSE161529_ER_breast_ER0029
GSE161529_ER0029_7C_AAATGCCAGCTGTCTA-1,GSE161529,812,443,tumour,ER breast cancer,29,GSE161529_ER_breast_ER0029_7C,2.463054,1,1,breast,primary tumour,ER Breast Cancer,GSE161529_ER_breast_ER0029
GSE161529_ER0029_7C_AAATGCCTCAAACAAG-1,GSE161529,2514,1017,tumour,ER breast cancer,29,GSE161529_ER_breast_ER0029_7C,3.778839,1,1,breast,primary tumour,ER Breast Cancer,GSE161529_ER_breast_ER0029
GSE161529_ER0029_7C_AACACGTTCTTAACCT-1,GSE161529,2614,1052,tumour,ER breast cancer,29,GSE161529_ER_breast_ER0029_7C,4.20811,1,1,breast,primary tumour,ER Breast Cancer,GSE161529_ER_breast_ER0029
GSE161529_ER0029_7C_AACCATGGTACCGAGA-1,GSE161529,2055,857,tumour,ER breast cancer,29,GSE161529_ER_breast_ER0029_7C,7.055961,1,1,breast,primary tumour,ER Breast Cancer,GSE161529_ER_breast_ER0029
GSE161529_ER0029_7C_AACTCCCAGGCTAGCA-1,GSE161529,7702,2304,tumour,ER breast cancer,29,GSE161529_ER_breast_ER0029_7C,3.791223,1,1,breast,primary tumour,ER Breast Cancer,GSE161529_ER_breast_ER0029


Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE161529_TN0135_TTTCCTCTCGAGAGCA-1,GSE161529,3019,1094,tumour,TNBC,135,GSE161529_TNBC_0135,2.51739,1,1,breast,primary tumour,TNBC,GSE161529_TNBC_0135
GSE161529_TN0135_TTTGCGCCACGGTTTA-1,GSE161529,2429,906,tumour,TNBC,135,GSE161529_TNBC_0135,3.252367,1,1,breast,primary tumour,TNBC,GSE161529_TNBC_0135
GSE161529_TN0135_TTTGCGCGTAAACGCG-1,GSE161529,9095,2480,tumour,TNBC,135,GSE161529_TNBC_0135,5.17867,1,1,breast,primary tumour,TNBC,GSE161529_TNBC_0135
GSE161529_TN0135_TTTGGTTTCGACCAGC-1,GSE161529,3207,1341,tumour,TNBC,135,GSE161529_TNBC_0135,4.70845,1,1,breast,primary tumour,TNBC,GSE161529_TNBC_0135
GSE161529_TN0135_TTTGTCAAGTCCGTAT-1,GSE161529,4551,1365,tumour,TNBC,135,GSE161529_TNBC_0135,2.966381,1,1,breast,primary tumour,TNBC,GSE161529_TNBC_0135
GSE161529_TN0135_TTTGTCACACAGGAGT-1,GSE161529,5355,1455,tumour,TNBC,135,GSE161529_TNBC_0135,2.745098,1,1,breast,primary tumour,TNBC,GSE161529_TNBC_0135



      GSE161529_BRCA1_TNBC_0177       GSE161529_BRCA1_TNBC_0554 
                           3084                            1754 
     GSE161529_ER_breast_ER0001      GSE161529_ER_breast_ER0025 
                            452                             578 
     GSE161529_ER_breast_ER0029      GSE161529_ER_breast_ER0032 
                            416                             100 
     GSE161529_ER_breast_ER0040      GSE161529_ER_breast_ER0042 
                            350                            2116 
     GSE161529_ER_breast_ER0043      GSE161529_ER_breast_ER0064 
                           1448                             154 
     GSE161529_ER_breast_ER0114      GSE161529_ER_breast_ER0125 
                            250                             201 
     GSE161529_ER_breast_ER0151      GSE161529_ER_breast_ER0163 
                            290                             475 
     GSE161529_ER_breast_ER0167      GSE161529_ER_breast_ER0173 
                        

In [37]:
#re-export seurat object ready for integration
saveRDS(BRE, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE161529_myeloid_int.RDS")

In [38]:
#remove all objects in R
rm(list = ls())

## GSE176078

In [3]:
BRE <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE176078_myeloid.RDS")

In [5]:
BRE
BRE@project.name
head(BRE@meta.data)

An object of class Seurat 
29733 features across 9374 samples within 1 assay 
Active assay: RNA (29733 features, 2000 variable features)
 53 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, counts.14, counts.15, counts.16, counts.17, counts.18, counts.19, counts.20, counts.21, counts.22, counts.23, counts.24, counts.25, counts.26, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, data.14, data.15, data.16, data.17, data.18, data.19, data.20, data.21, data.22, data.23, data.24, data.25, data.26, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,X,percent.mito,subtype,celltype_subset,celltype_minor,celltype_major,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE176078_HER2_CID3586_AACCATGCAGGTCGTC,CID3586,6925,1897,CID3586_AACCATGCAGGTCGTC,2.194946,HER2+,Myeloid_c10_Macrophage_1_EGR1,Macrophage,Myeloid,tumour,HER2 Breast Cancer,CID3586,GSE176078_HER2_breast_CID3586,2.194946,1,1
GSE176078_HER2_CID3586_AACTTTCGTGACCAAG,CID3586,8552,2318,CID3586_AACTTTCGTGACCAAG,2.958372,HER2+,Myeloid_c10_Macrophage_1_EGR1,Macrophage,Myeloid,tumour,HER2 Breast Cancer,CID3586,GSE176078_HER2_breast_CID3586,2.958372,1,1
GSE176078_HER2_CID3586_AAGGTTCAGTCCTCCT,CID3586,9355,2382,CID3586_AAGGTTCAGTCCTCCT,2.501336,HER2+,Myeloid_c10_Macrophage_1_EGR1,Macrophage,Myeloid,tumour,HER2 Breast Cancer,CID3586,GSE176078_HER2_breast_CID3586,2.501336,1,1
GSE176078_HER2_CID3586_ACTATCTGTCTAAAGA,CID3586,16706,2903,CID3586_ACTATCTGTCTAAAGA,4.579193,HER2+,Myeloid_c10_Macrophage_1_EGR1,Macrophage,Myeloid,tumour,HER2 Breast Cancer,CID3586,GSE176078_HER2_breast_CID3586,4.579193,1,1
GSE176078_HER2_CID3586_ATTACTCAGACTTTCG,CID3586,9537,2520,CID3586_ATTACTCAGACTTTCG,3.827199,HER2+,Myeloid_c10_Macrophage_1_EGR1,Macrophage,Myeloid,tumour,HER2 Breast Cancer,CID3586,GSE176078_HER2_breast_CID3586,3.827199,1,1
GSE176078_HER2_CID3586_CACTCCAGTTCGCTAA,CID3586,9162,2323,CID3586_CACTCCAGTTCGCTAA,2.619515,HER2+,Myeloid_c10_Macrophage_1_EGR1,Macrophage,Myeloid,tumour,HER2 Breast Cancer,CID3586,GSE176078_HER2_breast_CID3586,2.619515,1,1


In [6]:
table(BRE$sample_type)
table(BRE$cancer_type)
table(BRE$patient_id)
table(BRE$sample_id)


tumour 
  9374 


  ER Breast Cancer HER2 Breast Cancer               TNBC 
              1691               1341               6342 


 CID3586  CID3838  CID3921  CID3941  CID3946  CID3948  CID3963  CID4040 
     143      445      376       36      157      116      476       47 
 CID4066  CID4067 CID4290A  CID4398 CID44041  CID4461  CID4463  CID4465 
     213      263      339      198      103       44      101      177 
 CID4471  CID4495 CID44971 CID44991  CID4513  CID4515 CID45171  CID4523 
     250      889      608      199     2845      529      164      359 
CID4530N  CID4535 
      87      210 


   GSE176078_ER_breast_CID3941    GSE176078_ER_breast_CID3948 
                            36                            116 
   GSE176078_ER_breast_CID4040    GSE176078_ER_breast_CID4067 
                            47                            263 
  GSE176078_ER_breast_CID4290A    GSE176078_ER_breast_CID4398 
                           339                            198 
   GSE176078_ER_breast_CID4461    GSE176078_ER_breast_CID4463 
                            44                            101 
   GSE176078_ER_breast_CID4471   GSE176078_ER_breast_CID4530N 
                           250                             87 
   GSE176078_ER_breast_CID4535  GSE176078_HER2_breast_CID3586 
                           210                            143 
 GSE176078_HER2_breast_CID3838  GSE176078_HER2_breast_CID3921 
                           445                            376 
 GSE176078_HER2_breast_CID4066 GSE176078_HER2_breast_CID45171 
                           213                        

In [9]:
#set site metadata
BRE@meta.data$site <- "breast"

#set sample_type_major
BRE@meta.data$sample_type_major <- "primary tumour"

In [11]:
#set cancer_subtype metadata
BRE@meta.data$cancer_subtype <- BRE@meta.data$cancer_type

In [13]:
#set integration_id metadata
BRE@meta.data$integration_id <- BRE@meta.data$sample_id

In [14]:
BRE
BRE@project.name
head(BRE@meta.data)

An object of class Seurat 
29733 features across 9374 samples within 1 assay 
Active assay: RNA (29733 features, 2000 variable features)
 53 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, counts.14, counts.15, counts.16, counts.17, counts.18, counts.19, counts.20, counts.21, counts.22, counts.23, counts.24, counts.25, counts.26, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, data.14, data.15, data.16, data.17, data.18, data.19, data.20, data.21, data.22, data.23, data.24, data.25, data.26, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,X,percent.mito,subtype,celltype_subset,celltype_minor,celltype_major,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<chr>,<chr>,<chr>,<chr>
GSE176078_HER2_CID3586_AACCATGCAGGTCGTC,CID3586,6925,1897,CID3586_AACCATGCAGGTCGTC,2.194946,HER2+,Myeloid_c10_Macrophage_1_EGR1,Macrophage,Myeloid,tumour,HER2 Breast Cancer,CID3586,GSE176078_HER2_breast_CID3586,2.194946,1,1,breast,primary tumour,HER2 Breast Cancer,GSE176078_HER2_breast_CID3586
GSE176078_HER2_CID3586_AACTTTCGTGACCAAG,CID3586,8552,2318,CID3586_AACTTTCGTGACCAAG,2.958372,HER2+,Myeloid_c10_Macrophage_1_EGR1,Macrophage,Myeloid,tumour,HER2 Breast Cancer,CID3586,GSE176078_HER2_breast_CID3586,2.958372,1,1,breast,primary tumour,HER2 Breast Cancer,GSE176078_HER2_breast_CID3586
GSE176078_HER2_CID3586_AAGGTTCAGTCCTCCT,CID3586,9355,2382,CID3586_AAGGTTCAGTCCTCCT,2.501336,HER2+,Myeloid_c10_Macrophage_1_EGR1,Macrophage,Myeloid,tumour,HER2 Breast Cancer,CID3586,GSE176078_HER2_breast_CID3586,2.501336,1,1,breast,primary tumour,HER2 Breast Cancer,GSE176078_HER2_breast_CID3586
GSE176078_HER2_CID3586_ACTATCTGTCTAAAGA,CID3586,16706,2903,CID3586_ACTATCTGTCTAAAGA,4.579193,HER2+,Myeloid_c10_Macrophage_1_EGR1,Macrophage,Myeloid,tumour,HER2 Breast Cancer,CID3586,GSE176078_HER2_breast_CID3586,4.579193,1,1,breast,primary tumour,HER2 Breast Cancer,GSE176078_HER2_breast_CID3586
GSE176078_HER2_CID3586_ATTACTCAGACTTTCG,CID3586,9537,2520,CID3586_ATTACTCAGACTTTCG,3.827199,HER2+,Myeloid_c10_Macrophage_1_EGR1,Macrophage,Myeloid,tumour,HER2 Breast Cancer,CID3586,GSE176078_HER2_breast_CID3586,3.827199,1,1,breast,primary tumour,HER2 Breast Cancer,GSE176078_HER2_breast_CID3586
GSE176078_HER2_CID3586_CACTCCAGTTCGCTAA,CID3586,9162,2323,CID3586_CACTCCAGTTCGCTAA,2.619515,HER2+,Myeloid_c10_Macrophage_1_EGR1,Macrophage,Myeloid,tumour,HER2 Breast Cancer,CID3586,GSE176078_HER2_breast_CID3586,2.619515,1,1,breast,primary tumour,HER2 Breast Cancer,GSE176078_HER2_breast_CID3586


In [17]:
#exclude any samples with <100 cells
table(BRE$integration_id)
#exclude ER_breast_CID3941, ER_breast_CID4040, ER_breast_CID4461, ER_breast_CID4530N
BRE <- subset(BRE, !(subset = integration_id %in% c("GSE176078_ER_breast_CID3941","GSE176078_ER_breast_CID4040","GSE176078_ER_breast_CID4461","GSE176078_ER_breast_CID4530N")))
table(BRE$integration_id)


   GSE176078_ER_breast_CID3941    GSE176078_ER_breast_CID3948 
                            36                            116 
   GSE176078_ER_breast_CID4040    GSE176078_ER_breast_CID4067 
                            47                            263 
  GSE176078_ER_breast_CID4290A    GSE176078_ER_breast_CID4398 
                           339                            198 
   GSE176078_ER_breast_CID4461    GSE176078_ER_breast_CID4463 
                            44                            101 
   GSE176078_ER_breast_CID4471   GSE176078_ER_breast_CID4530N 
                           250                             87 
   GSE176078_ER_breast_CID4535  GSE176078_HER2_breast_CID3586 
                           210                            143 
 GSE176078_HER2_breast_CID3838  GSE176078_HER2_breast_CID3921 
                           445                            376 
 GSE176078_HER2_breast_CID4066 GSE176078_HER2_breast_CID45171 
                           213                        


   GSE176078_ER_breast_CID3948    GSE176078_ER_breast_CID4067 
                           116                            263 
  GSE176078_ER_breast_CID4290A    GSE176078_ER_breast_CID4398 
                           339                            198 
   GSE176078_ER_breast_CID4463    GSE176078_ER_breast_CID4471 
                           101                            250 
   GSE176078_ER_breast_CID4535  GSE176078_HER2_breast_CID3586 
                           210                            143 
 GSE176078_HER2_breast_CID3838  GSE176078_HER2_breast_CID3921 
                           445                            376 
 GSE176078_HER2_breast_CID4066 GSE176078_HER2_breast_CID45171 
                           213                            164 
        GSE176078_TNBC_CID3946         GSE176078_TNBC_CID3963 
                           157                            476 
       GSE176078_TNBC_CID44041         GSE176078_TNBC_CID4465 
                           103                        

In [18]:
#join layers and then split them by integration_id
Layers(BRE[["RNA"]])
#join layers
BRE[["RNA"]] <- JoinLayers(BRE[["RNA"]])
Layers(BRE[["RNA"]])
#split layers
BRE[["RNA"]] <- split(BRE[["RNA"]], f = BRE$integration_id)
Layers(BRE[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [20]:
#record number of cells
table(BRE$integration_id)
BRE


   GSE176078_ER_breast_CID3948    GSE176078_ER_breast_CID4067 
                           116                            263 
  GSE176078_ER_breast_CID4290A    GSE176078_ER_breast_CID4398 
                           339                            198 
   GSE176078_ER_breast_CID4463    GSE176078_ER_breast_CID4471 
                           101                            250 
   GSE176078_ER_breast_CID4535  GSE176078_HER2_breast_CID3586 
                           210                            143 
 GSE176078_HER2_breast_CID3838  GSE176078_HER2_breast_CID3921 
                           445                            376 
 GSE176078_HER2_breast_CID4066 GSE176078_HER2_breast_CID45171 
                           213                            164 
        GSE176078_TNBC_CID3946         GSE176078_TNBC_CID3963 
                           157                            476 
       GSE176078_TNBC_CID44041         GSE176078_TNBC_CID4465 
                           103                        

An object of class Seurat 
29733 features across 9160 samples within 1 assay 
Active assay: RNA (29733 features, 2000 variable features)
 45 layers present: data.GSE176078_HER2_breast_CID3586, data.GSE176078_HER2_breast_CID3838, data.GSE176078_HER2_breast_CID3921, data.GSE176078_TNBC_CID3946, data.GSE176078_ER_breast_CID3948, data.GSE176078_TNBC_CID3963, data.GSE176078_HER2_breast_CID4066, data.GSE176078_ER_breast_CID4067, data.GSE176078_ER_breast_CID4290A, data.GSE176078_ER_breast_CID4398, data.GSE176078_ER_breast_CID4463, data.GSE176078_TNBC_CID4465, data.GSE176078_ER_breast_CID4471, data.GSE176078_TNBC_CID4495, data.GSE176078_TNBC_CID4513, data.GSE176078_TNBC_CID4515, data.GSE176078_TNBC_CID4523, data.GSE176078_ER_breast_CID4535, data.GSE176078_TNBC_CID44041, data.GSE176078_TNBC_CID44971, data.GSE176078_TNBC_CID44991, data.GSE176078_HER2_breast_CID45171, scale.data, counts.GSE176078_HER2_breast_CID3586, counts.GSE176078_HER2_breast_CID3838, counts.GSE176078_HER2_breast_CID3921, coun

In [21]:
#re-export seurat object ready for integration
saveRDS(BRE, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE176078_myeloid_int.RDS")

In [22]:
#remove all objects in R
rm(list = ls())

## GSE195861

In [27]:
BRE <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE195861_myeloid.RDS")

In [28]:
BRE
BRE@project.name
head(BRE@meta.data)

An object of class Seurat 
33538 features across 15286 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 41 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, counts.14, counts.15, counts.16, counts.17, counts.18, counts.19, counts.20, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, data.14, data.15, data.16, data.17, data.18, data.19, data.20, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE195861_Healthy_AAACGAACACTGGACC-1,GSE195861,736,427,Healthy_breast,Healthy,Norm1,GSE195861_Healthy,1.766304,4,4
GSE195861_Healthy_AAATGGAGTCCAGGTC-1,GSE195861,1038,585,Healthy_breast,Healthy,Norm1,GSE195861_Healthy,10.693642,1,1
GSE195861_Healthy_AACAAAGAGTCATCCA-1,GSE195861,18003,4395,Healthy_breast,Healthy,Norm1,GSE195861_Healthy,10.381603,1,1
GSE195861_Healthy_AACCATGCACGACAAG-1,GSE195861,1212,825,Healthy_breast,Healthy,Norm1,GSE195861_Healthy,1.732673,1,1
GSE195861_Healthy_AACGTCAGTAGACAAT-1,GSE195861,680,462,Healthy_breast,Healthy,Norm1,GSE195861_Healthy,3.970588,1,1
GSE195861_Healthy_AAGCGTTCAGATAAAC-1,GSE195861,2446,1053,Healthy_breast,Healthy,Norm1,GSE195861_Healthy,2.085037,1,1


In [29]:
table(BRE$sample_type)
table(BRE$cancer_type)
table(BRE$patient_id)
table(BRE$sample_id)


Healthy_breast         LN_met         tumour 
           158           1278          13850 


   DCIS Healthy     IDC 
  12436     158    2692 


Norm1   pt1  pt10  pt11  pt12  pt13   pt2   pt3   pt4   pt5   pt6   pt7   pt8 
  158    62   274   131   254   246 10629   432   163   705   172   273   596 
  pt9 
 1191 


GSE195861_DCIS_tumour_pt1 GSE195861_DCIS_tumour_pt2 GSE195861_DCIS_tumour_pt3 
                       62                     10629                       432 
GSE195861_DCIS_tumour_pt4 GSE195861_DCIS_tumour_pt5 GSE195861_DCIS_tumour_pt6 
                      163                       705                       172 
GSE195861_DCIS_tumour_pt7         GSE195861_Healthy GSE195861_IDC_LN-met_pt10 
                      273                       158                       197 
GSE195861_IDC_LN-met_pt11 GSE195861_IDC_LN-met_pt12 GSE195861_IDC_LN-met_pt13 
                       97                       145                       155 
 GSE195861_IDC_LN-met_pt8  GSE195861_IDC_LN-met_pt9 GSE195861_IDC_tumour_pt10 
                      240                       444                        77 
GSE195861_IDC_tumour_pt11 GSE195861_IDC_tumour_pt12 GSE195861_IDC_tumour_pt13 
                       34                       109                        91 
 GSE195861_IDC_tumour_pt8  GSE195861_IDC_tumour_pt9

In [32]:
#set site metadata

#split by sample_type
BRE_H <- subset(BRE, subset = sample_type %in% c("Healthy_breast"))
BRE_LN <- subset(BRE, subset = sample_type %in% c("LN_met"))
BRE_T <- subset(BRE, subset = sample_type %in% c("tumour"))

BRE_H@meta.data$site <- "breast"
BRE_LN@meta.data$site <- "lymph node"
BRE_T@meta.data$site <- "breast"

BRE_H@meta.data$sample_type_major <- "healthy"
BRE_LN@meta.data$sample_type_major <- "metastatic tumour"
BRE_T@meta.data$sample_type_major <- "primary tumour"

#merge back together 
BRE <- merge(BRE_H, y = c(BRE_LN, BRE_T), project = "GSE195861")

In [33]:
#set cancer_subtype metadata

#split by cancer_type
BRE_D <- subset(BRE, subset = cancer_type %in% c("DCIS"))
BRE_H <- subset(BRE, subset = cancer_type %in% c("Healthy"))
BRE_I <- subset(BRE, subset = cancer_type %in% c("IDC"))

BRE_D@meta.data$cancer_subtype <- "Breast DCIS"
BRE_H@meta.data$cancer_subtype <- "NA"
BRE_I@meta.data$cancer_subtype <- "Breast IDC"

#merge back together 
BRE <- merge(BRE_D, y = c(BRE_H, BRE_I), project = "GSE195861")

In [34]:
#set integration_id metadata
BRE@meta.data$integration_id <- BRE@meta.data$sample_id

In [35]:
BRE
BRE@project.name
head(BRE@meta.data)

An object of class Seurat 
33538 features across 15286 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 44 layers present: counts.2.3.3.1, counts.3.3.3.1, counts.4.3.3.1, counts.5.3.3.1, counts.6.3.3.1, counts.7.3.3.1, counts.8.3.3.1, data.2.3.3.1, data.3.3.3.1, data.4.3.3.1, data.5.3.3.1, data.6.3.3.1, data.7.3.3.1, data.8.3.3.1, scale.data.3.3.1, counts.1.1.1.2, data.1.1.1.2, scale.data.1.1.2, counts.15.2.2.3, counts.16.2.2.3, counts.17.2.2.3, counts.18.2.2.3, counts.19.2.2.3, counts.20.2.2.3, data.15.2.2.3, data.16.2.2.3, data.17.2.2.3, data.18.2.2.3, data.19.2.2.3, data.20.2.2.3, scale.data.2.2.3, counts.10.3.3.3, counts.11.3.3.3, counts.12.3.3.3, counts.13.3.3.3, counts.14.3.3.3, counts.9.3.3.3, data.9.3.3.3, data.10.3.3.3, data.11.3.3.3, data.12.3.3.3, data.13.3.3.3, data.14.3.3.3, scale.data.3.3.3

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE195861_DCIS_tumour_pt1_AAAGGATCAAATCAGA-1,GSE195861,95030,7236,tumour,DCIS,pt1,GSE195861_DCIS_tumour_pt1,0.689256,4,4,breast,primary tumour,Breast DCIS,GSE195861_DCIS_tumour_pt1
GSE195861_DCIS_tumour_pt1_AAATGGACAATTGCTG-1,GSE195861,116697,8039,tumour,DCIS,pt1,GSE195861_DCIS_tumour_pt1,4.363437,4,4,breast,primary tumour,Breast DCIS,GSE195861_DCIS_tumour_pt1
GSE195861_DCIS_tumour_pt1_AACCTTTCAGCAATTC-1,GSE195861,65271,7326,tumour,DCIS,pt1,GSE195861_DCIS_tumour_pt1,5.464908,4,4,breast,primary tumour,Breast DCIS,GSE195861_DCIS_tumour_pt1
GSE195861_DCIS_tumour_pt1_ACAAAGACATCTTCGC-1,GSE195861,26262,4457,tumour,DCIS,pt1,GSE195861_DCIS_tumour_pt1,8.27812,1,1,breast,primary tumour,Breast DCIS,GSE195861_DCIS_tumour_pt1
GSE195861_DCIS_tumour_pt1_ACAAAGAGTGCCTATA-1,GSE195861,862,417,tumour,DCIS,pt1,GSE195861_DCIS_tumour_pt1,6.38051,4,4,breast,primary tumour,Breast DCIS,GSE195861_DCIS_tumour_pt1
GSE195861_DCIS_tumour_pt1_ACAAGCTGTGTCATGT-1,GSE195861,4024,1333,tumour,DCIS,pt1,GSE195861_DCIS_tumour_pt1,4.324056,4,4,breast,primary tumour,Breast DCIS,GSE195861_DCIS_tumour_pt1


In [37]:
#exclude any samples with <100 cells
table(BRE$integration_id)
#exclude GSE195861_DCIS_tumour_pt1, GSE195861_IDC_LN-met_pt11, GSE195861_IDC_tumour_pt10, GSE195861_IDC_tumour_pt11, GSE195861_IDC_tumour_pt13
BRE <- subset(BRE, !(subset = integration_id %in% c("GSE195861_DCIS_tumour_pt1","GSE195861_IDC_LN-met_pt11","GSE195861_IDC_tumour_pt10","GSE195861_IDC_tumour_pt11","GSE195861_IDC_tumour_pt13")))
table(BRE$integration_id)


GSE195861_DCIS_tumour_pt1 GSE195861_DCIS_tumour_pt2 GSE195861_DCIS_tumour_pt3 
                       62                     10629                       432 
GSE195861_DCIS_tumour_pt4 GSE195861_DCIS_tumour_pt5 GSE195861_DCIS_tumour_pt6 
                      163                       705                       172 
GSE195861_DCIS_tumour_pt7         GSE195861_Healthy GSE195861_IDC_LN-met_pt10 
                      273                       158                       197 
GSE195861_IDC_LN-met_pt11 GSE195861_IDC_LN-met_pt12 GSE195861_IDC_LN-met_pt13 
                       97                       145                       155 
 GSE195861_IDC_LN-met_pt8  GSE195861_IDC_LN-met_pt9 GSE195861_IDC_tumour_pt10 
                      240                       444                        77 
GSE195861_IDC_tumour_pt11 GSE195861_IDC_tumour_pt12 GSE195861_IDC_tumour_pt13 
                       34                       109                        91 
 GSE195861_IDC_tumour_pt8  GSE195861_IDC_tumour_pt9


GSE195861_DCIS_tumour_pt2 GSE195861_DCIS_tumour_pt3 GSE195861_DCIS_tumour_pt4 
                    10629                       432                       163 
GSE195861_DCIS_tumour_pt5 GSE195861_DCIS_tumour_pt6 GSE195861_DCIS_tumour_pt7 
                      705                       172                       273 
        GSE195861_Healthy GSE195861_IDC_LN-met_pt10 GSE195861_IDC_LN-met_pt12 
                      158                       197                       145 
GSE195861_IDC_LN-met_pt13  GSE195861_IDC_LN-met_pt8  GSE195861_IDC_LN-met_pt9 
                      155                       240                       444 
GSE195861_IDC_tumour_pt12  GSE195861_IDC_tumour_pt8  GSE195861_IDC_tumour_pt9 
                      109                       356                       747 

In [38]:
#join layers and then split them by integration_id
Layers(BRE[["RNA"]])
#join layers
BRE[["RNA"]] <- JoinLayers(BRE[["RNA"]])
Layers(BRE[["RNA"]])
#split layers
BRE[["RNA"]] <- split(BRE[["RNA"]], f = BRE$integration_id)
Layers(BRE[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [39]:
#record number of cells
table(BRE$integration_id)
BRE


GSE195861_DCIS_tumour_pt2 GSE195861_DCIS_tumour_pt3 GSE195861_DCIS_tumour_pt4 
                    10629                       432                       163 
GSE195861_DCIS_tumour_pt5 GSE195861_DCIS_tumour_pt6 GSE195861_DCIS_tumour_pt7 
                      705                       172                       273 
        GSE195861_Healthy GSE195861_IDC_LN-met_pt10 GSE195861_IDC_LN-met_pt12 
                      158                       197                       145 
GSE195861_IDC_LN-met_pt13  GSE195861_IDC_LN-met_pt8  GSE195861_IDC_LN-met_pt9 
                      155                       240                       444 
GSE195861_IDC_tumour_pt12  GSE195861_IDC_tumour_pt8  GSE195861_IDC_tumour_pt9 
                      109                       356                       747 

An object of class Seurat 
33538 features across 14925 samples within 1 assay 
Active assay: RNA (33538 features, 2000 variable features)
 31 layers present: counts.GSE195861_DCIS_tumour_pt2, counts.GSE195861_DCIS_tumour_pt3, counts.GSE195861_DCIS_tumour_pt4, counts.GSE195861_DCIS_tumour_pt5, counts.GSE195861_DCIS_tumour_pt6, counts.GSE195861_DCIS_tumour_pt7, counts.GSE195861_Healthy, counts.GSE195861_IDC_LN-met_pt8, counts.GSE195861_IDC_LN-met_pt9, counts.GSE195861_IDC_LN-met_pt10, counts.GSE195861_IDC_LN-met_pt12, counts.GSE195861_IDC_LN-met_pt13, counts.GSE195861_IDC_tumour_pt8, counts.GSE195861_IDC_tumour_pt9, counts.GSE195861_IDC_tumour_pt12, scale.data, data.GSE195861_DCIS_tumour_pt2, data.GSE195861_DCIS_tumour_pt3, data.GSE195861_DCIS_tumour_pt4, data.GSE195861_DCIS_tumour_pt5, data.GSE195861_DCIS_tumour_pt6, data.GSE195861_DCIS_tumour_pt7, data.GSE195861_Healthy, data.GSE195861_IDC_LN-met_pt8, data.GSE195861_IDC_LN-met_pt9, data.GSE195861_IDC_LN-met_pt10, data.GSE195861_IDC_LN-

In [40]:
#re-export seurat object ready for integration
saveRDS(BRE, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE195861_myeloid_int.RDS")

In [41]:
#remove all objects in R
rm(list = ls())

## GSE199515 

In [3]:
BRE <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE199515_myeloid.RDS")

In [4]:
BRE
BRE@project.name
head(BRE@meta.data)

An object of class Seurat 
33694 features across 499 samples within 1 assay 
Active assay: RNA (33694 features, 2000 variable features)
 7 layers present: counts.1, counts.2, counts.3, data.1, data.2, data.3, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE199515_TNBC1_AAACCTGCACGGATAG-1,GSE199515,3229,1175,tumour,TNBC,TNBC1,GSE199515_TNBC1,1.982038,6,6
GSE199515_TNBC1_AAACGGGTCGTAGGTT-1,GSE199515,3356,1058,tumour,TNBC,TNBC1,GSE199515_TNBC1,2.532777,6,6
GSE199515_TNBC1_AAAGATGTCATTGCGA-1,GSE199515,4532,1215,tumour,TNBC,TNBC1,GSE199515_TNBC1,2.250662,6,6
GSE199515_TNBC1_AACTCCCAGTAATCCC-1,GSE199515,4175,999,tumour,TNBC,TNBC1,GSE199515_TNBC1,3.664671,6,6
GSE199515_TNBC1_AACTCCCTCTCTGTCG-1,GSE199515,2921,908,tumour,TNBC,TNBC1,GSE199515_TNBC1,11.982198,6,6
GSE199515_TNBC1_AAGGAGCGTTCCGTCT-1,GSE199515,3657,1308,tumour,TNBC,TNBC1,GSE199515_TNBC1,3.773585,6,6


In [5]:
table(BRE$sample_type)
table(BRE$cancer_type)
table(BRE$patient_id)
table(BRE$sample_id)


tumour 
   499 


TNBC 
 499 


TNBC1 TNBC2 TNBC3 
  301    64   134 


GSE199515_TNBC1 GSE199515_TNBC2 GSE199515_TNBC3 
            301              64             134 

In [6]:
#set site metadata
BRE@meta.data$site <- "breast"
#set sample_type_major metadata
BRE@meta.data$sample_type_major <- "primary tumour"
#set cancer_subtype metadata
BRE@meta.data$cancer_subtype <- "TNBC"
#set integration_id metadata
BRE@meta.data$integration_id <- BRE@meta.data$sample_id

In [7]:
BRE
BRE@project.name
head(BRE@meta.data)

An object of class Seurat 
33694 features across 499 samples within 1 assay 
Active assay: RNA (33694 features, 2000 variable features)
 7 layers present: counts.1, counts.2, counts.3, data.1, data.2, data.3, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>,<chr>,<chr>,<chr>,<chr>
GSE199515_TNBC1_AAACCTGCACGGATAG-1,GSE199515,3229,1175,tumour,TNBC,TNBC1,GSE199515_TNBC1,1.982038,6,6,breast,primary tumour,TNBC,GSE199515_TNBC1
GSE199515_TNBC1_AAACGGGTCGTAGGTT-1,GSE199515,3356,1058,tumour,TNBC,TNBC1,GSE199515_TNBC1,2.532777,6,6,breast,primary tumour,TNBC,GSE199515_TNBC1
GSE199515_TNBC1_AAAGATGTCATTGCGA-1,GSE199515,4532,1215,tumour,TNBC,TNBC1,GSE199515_TNBC1,2.250662,6,6,breast,primary tumour,TNBC,GSE199515_TNBC1
GSE199515_TNBC1_AACTCCCAGTAATCCC-1,GSE199515,4175,999,tumour,TNBC,TNBC1,GSE199515_TNBC1,3.664671,6,6,breast,primary tumour,TNBC,GSE199515_TNBC1
GSE199515_TNBC1_AACTCCCTCTCTGTCG-1,GSE199515,2921,908,tumour,TNBC,TNBC1,GSE199515_TNBC1,11.982198,6,6,breast,primary tumour,TNBC,GSE199515_TNBC1
GSE199515_TNBC1_AAGGAGCGTTCCGTCT-1,GSE199515,3657,1308,tumour,TNBC,TNBC1,GSE199515_TNBC1,3.773585,6,6,breast,primary tumour,TNBC,GSE199515_TNBC1


In [9]:
#exclude any samples with <100 cells
table(BRE$integration_id)
#exclude TNBC2
BRE <- subset(BRE, !(subset = integration_id %in% c("GSE199515_TNBC2")))
table(BRE$integration_id)


GSE199515_TNBC1 GSE199515_TNBC2 GSE199515_TNBC3 
            301              64             134 


GSE199515_TNBC1 GSE199515_TNBC3 
            301             134 

In [10]:
#join layers and then split them by integration_id
Layers(BRE[["RNA"]])
#join layers
BRE[["RNA"]] <- JoinLayers(BRE[["RNA"]])
Layers(BRE[["RNA"]])
#split layers
BRE[["RNA"]] <- split(BRE[["RNA"]], f = BRE$integration_id)
Layers(BRE[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [11]:
#record number of cells
table(BRE$integration_id)
BRE


GSE199515_TNBC1 GSE199515_TNBC3 
            301             134 

An object of class Seurat 
33694 features across 435 samples within 1 assay 
Active assay: RNA (33694 features, 2000 variable features)
 5 layers present: data.GSE199515_TNBC1, data.GSE199515_TNBC3, scale.data, counts.GSE199515_TNBC1, counts.GSE199515_TNBC3
 2 dimensional reductions calculated: pca, umap

In [12]:
#re-export seurat object ready for integration
saveRDS(BRE, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE199515_myeloid_int.RDS")

In [13]:
#remove all objects in R
rm(list = ls())

## GSE225600

In [14]:
BRE <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE225600_myeloid.RDS")

In [15]:
BRE
BRE@project.name
head(BRE@meta.data)

An object of class Seurat 
36601 features across 2135 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 17 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,cancer_type,sample_meta,sample_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE225600_LN_mets_pt2_AAAGGATCAATGAAAC-L2,GSE225600,1343,616,breast cancer,L2,LN mets,pt2,GSE225600_BC_LN_mets_pt2,5.807893,8,8
GSE225600_LN_mets_pt2_AAAGTGATCAATCGGT-L2,GSE225600,579,439,breast cancer,L2,LN mets,pt2,GSE225600_BC_LN_mets_pt2,2.417962,8,8
GSE225600_LN_mets_pt2_AACAACCAGCAAGCCA-L2,GSE225600,621,443,breast cancer,L2,LN mets,pt2,GSE225600_BC_LN_mets_pt2,4.186795,8,8
GSE225600_LN_mets_pt2_AACACACTCTAATTCC-L2,GSE225600,1096,680,breast cancer,L2,LN mets,pt2,GSE225600_BC_LN_mets_pt2,4.379562,8,8
GSE225600_LN_mets_pt2_AACCATGCAACCACGC-L2,GSE225600,407,294,breast cancer,L2,LN mets,pt2,GSE225600_BC_LN_mets_pt2,13.267813,8,8
GSE225600_LN_mets_pt2_AACCCAAAGTAGTCAA-L2,GSE225600,606,279,breast cancer,L2,LN mets,pt2,GSE225600_BC_LN_mets_pt2,42.574257,8,8


In [16]:
table(BRE$sample_type)
table(BRE$cancer_type)
table(BRE$patient_id)
table(BRE$sample_id)


LN mets  tumour 
    794    1341 


breast cancer 
         2135 


pt2 pt3 pt6 pt7 
501 180 601 853 


   GSE225600_BC_LN_mets_pt2    GSE225600_BC_LN_mets_pt3 
                        328                          94 
   GSE225600_BC_LN_mets_pt6    GSE225600_BC_LN_mets_pt7 
                         67                         305 
GSE225600_breast_tumour_pt2 GSE225600_breast_tumour_pt3 
                        173                          86 
GSE225600_breast_tumour_pt6 GSE225600_breast_tumour_pt7 
                        534                         548 

In [17]:
#set site and sample_type_major metadata

#split by sample_type
BRE_LN <- subset(BRE, subset = sample_type %in% c("LN mets"))
BRE_T <- subset(BRE, subset = sample_type %in% c("tumour"))

BRE_LN@meta.data$site <- "lymph node"
BRE_T@meta.data$site <- "breast"

BRE_LN@meta.data$sample_type_major <- "metastatic tumour"
BRE_T@meta.data$sample_type_major <- "primary tumour"

#merge back together 
BRE <- merge(BRE_LN, y = c(BRE_T), project = "GSE225600")

In [18]:
#set cancer_subtype metadata
BRE@meta.data$cancer_subtype <- "Breast IDC"

In [19]:
#set integration_id metadata
BRE@meta.data$integration_id <- BRE@meta.data$sample_id

In [20]:
BRE
BRE@project.name
head(BRE@meta.data)

An object of class Seurat 
36601 features across 2135 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 18 layers present: counts.1.1, counts.2.1, counts.3.1, counts.4.1, data.1.1, data.2.1, data.3.1, data.4.1, scale.data.1, counts.5.2, counts.6.2, counts.7.2, counts.8.2, data.5.2, data.6.2, data.7.2, data.8.2, scale.data.2

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,cancer_type,sample_meta,sample_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE225600_LN_mets_pt2_AAAGGATCAATGAAAC-L2,GSE225600,1343,616,breast cancer,L2,LN mets,pt2,GSE225600_BC_LN_mets_pt2,5.807893,8,8,lymph node,metastatic tumour,Breast IDC,GSE225600_BC_LN_mets_pt2
GSE225600_LN_mets_pt2_AAAGTGATCAATCGGT-L2,GSE225600,579,439,breast cancer,L2,LN mets,pt2,GSE225600_BC_LN_mets_pt2,2.417962,8,8,lymph node,metastatic tumour,Breast IDC,GSE225600_BC_LN_mets_pt2
GSE225600_LN_mets_pt2_AACAACCAGCAAGCCA-L2,GSE225600,621,443,breast cancer,L2,LN mets,pt2,GSE225600_BC_LN_mets_pt2,4.186795,8,8,lymph node,metastatic tumour,Breast IDC,GSE225600_BC_LN_mets_pt2
GSE225600_LN_mets_pt2_AACACACTCTAATTCC-L2,GSE225600,1096,680,breast cancer,L2,LN mets,pt2,GSE225600_BC_LN_mets_pt2,4.379562,8,8,lymph node,metastatic tumour,Breast IDC,GSE225600_BC_LN_mets_pt2
GSE225600_LN_mets_pt2_AACCATGCAACCACGC-L2,GSE225600,407,294,breast cancer,L2,LN mets,pt2,GSE225600_BC_LN_mets_pt2,13.267813,8,8,lymph node,metastatic tumour,Breast IDC,GSE225600_BC_LN_mets_pt2
GSE225600_LN_mets_pt2_AACCCAAAGTAGTCAA-L2,GSE225600,606,279,breast cancer,L2,LN mets,pt2,GSE225600_BC_LN_mets_pt2,42.574257,8,8,lymph node,metastatic tumour,Breast IDC,GSE225600_BC_LN_mets_pt2


In [22]:
#exclude any samples with <100 cells
table(BRE$integration_id)
#exclude LN_pt3, LN_pt6, T_pt3
BRE <- subset(BRE, !(subset = integration_id %in% c("GSE225600_BC_LN_mets_pt3","GSE225600_BC_LN_mets_pt6","GSE225600_breast_tumour_pt3")))
table(BRE$integration_id)


   GSE225600_BC_LN_mets_pt2    GSE225600_BC_LN_mets_pt3 
                        328                          94 
   GSE225600_BC_LN_mets_pt6    GSE225600_BC_LN_mets_pt7 
                         67                         305 
GSE225600_breast_tumour_pt2 GSE225600_breast_tumour_pt3 
                        173                          86 
GSE225600_breast_tumour_pt6 GSE225600_breast_tumour_pt7 
                        534                         548 


   GSE225600_BC_LN_mets_pt2    GSE225600_BC_LN_mets_pt7 
                        328                         305 
GSE225600_breast_tumour_pt2 GSE225600_breast_tumour_pt6 
                        173                         534 
GSE225600_breast_tumour_pt7 
                        548 

In [23]:
#join layers and then split them by integration_id
Layers(BRE[["RNA"]])
#join layers
BRE[["RNA"]] <- JoinLayers(BRE[["RNA"]])
Layers(BRE[["RNA"]])
#split layers
BRE[["RNA"]] <- split(BRE[["RNA"]], f = BRE$integration_id)
Layers(BRE[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [25]:
#record number of cells
table(BRE$integration_id)
BRE@project.name
BRE


   GSE225600_BC_LN_mets_pt2    GSE225600_BC_LN_mets_pt7 
                        328                         305 
GSE225600_breast_tumour_pt2 GSE225600_breast_tumour_pt6 
                        173                         534 
GSE225600_breast_tumour_pt7 
                        548 

An object of class Seurat 
36601 features across 1888 samples within 1 assay 
Active assay: RNA (36601 features, 2000 variable features)
 11 layers present: counts.GSE225600_BC_LN_mets_pt2, counts.GSE225600_BC_LN_mets_pt7, counts.GSE225600_breast_tumour_pt2, counts.GSE225600_breast_tumour_pt6, counts.GSE225600_breast_tumour_pt7, scale.data, data.GSE225600_BC_LN_mets_pt2, data.GSE225600_BC_LN_mets_pt7, data.GSE225600_breast_tumour_pt2, data.GSE225600_breast_tumour_pt6, data.GSE225600_breast_tumour_pt7

In [26]:
#re-export seurat object ready for integration
saveRDS(BRE, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE225600_myeloid_int.RDS")

In [27]:
#remove all objects in R
rm(list = ls())

## GSE162498

In [28]:
LUNG <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE162498_myeloid.RDS")

In [29]:
LUNG
LUNG@project.name
head(LUNG@meta.data)

An object of class Seurat 
45068 features across 28781 samples within 1 assay 
Active assay: RNA (45068 features, 2000 variable features)
 27 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, scale.data
 2 dimensional reductions calculated: pca, umap

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE162498_NSCLC_P34_AAACGGGCATAGTAAG-1,GSE162498,980,450,tumour,NSCLC,P34,GSE162498_NSCLC_P34,8.265306,4,4
GSE162498_NSCLC_P34_AAAGTAGCAAACTGCT-1,GSE162498,11484,2224,tumour,NSCLC,P34,GSE162498_NSCLC_P34,3.692093,8,8
GSE162498_NSCLC_P34_AAATGCCCAGTCGATT-1,GSE162498,524,200,tumour,NSCLC,P34,GSE162498_NSCLC_P34,17.557252,4,4
GSE162498_NSCLC_P34_AAATGCCGTCACACGC-1,GSE162498,3104,300,tumour,NSCLC,P34,GSE162498_NSCLC_P34,83.82732,9,9
GSE162498_NSCLC_P34_AACACGTAGATACACA-1,GSE162498,604,311,tumour,NSCLC,P34,GSE162498_NSCLC_P34,17.218543,4,4
GSE162498_NSCLC_P34_AACACGTCAGGGTACA-1,GSE162498,15436,3018,tumour,NSCLC,P34,GSE162498_NSCLC_P34,5.93418,8,8


In [30]:
table(LUNG$sample_type)
table(LUNG$cancer_type)
table(LUNG$patient_id)
table(LUNG$sample_id)


adjacent healthy           tumour 
            2024            26757 


healthy   NSCLC 
   2024   26757 


 P34  P35  P42  P43  P46  P47  P55  P57  P58  P60  P61 
 482  685 4404 1868  999 7144 1579 5809  680 2654 2477 


GSE162498_healthy_P60 GSE162498_healthy_P61   GSE162498_NSCLC_P34 
                  809                  1215                   482 
  GSE162498_NSCLC_P35   GSE162498_NSCLC_P42   GSE162498_NSCLC_P43 
                  685                  4404                  1868 
  GSE162498_NSCLC_P46   GSE162498_NSCLC_P47   GSE162498_NSCLC_P55 
                  999                  7144                  1579 
  GSE162498_NSCLC_P57   GSE162498_NSCLC_P58   GSE162498_NSCLC_P60 
                 5809                   680                  1845 
  GSE162498_NSCLC_P61 
                 1262 

In [31]:
#set site metadata
LUNG@meta.data$site <- "lung"

#set sample_type_major metadata

#split by cancer_type
LUNG_H <- subset(LUNG, subset = cancer_type %in% c("healthy"))
LUNG_T <- subset(LUNG, subset = cancer_type %in% c("NSCLC"))

LUNG_H@meta.data$sample_type_major <- "healthy"
LUNG_T@meta.data$sample_type_major <- "primary tumour"

#set cancer_subtype metadata
LUNG_H@meta.data$cancer_subtype <- "NA"
LUNG_T@meta.data$cancer_subtype <- "NSCLC"

#merge back together 
LUNG <- merge(LUNG_H, y = c(LUNG_T), project = "GSE162498")

In [32]:
#set integration_id metadata
LUNG@meta.data$integration_id <- LUNG@meta.data$sample_id

In [33]:
LUNG
LUNG@project.name
head(LUNG@meta.data)

An object of class Seurat 
45068 features across 28781 samples within 1 assay 
Active assay: RNA (45068 features, 2000 variable features)
 28 layers present: counts.10.1, counts.12.1, data.10.1, data.12.1, scale.data.1, counts.1.2, counts.2.2, counts.3.2, counts.4.2, counts.5.2, counts.6.2, counts.7.2, counts.8.2, counts.9.2, counts.11.2, counts.13.2, data.1.2, data.2.2, data.3.2, data.4.2, data.5.2, data.6.2, data.7.2, data.8.2, data.9.2, data.11.2, data.13.2, scale.data.2

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE162498_Healthy_P60_AAACCTGGTTCCACTC-1,GSE162498,1608,763,adjacent healthy,healthy,P60,GSE162498_healthy_P60,9.328358,4,4,lung,healthy,,GSE162498_healthy_P60
GSE162498_Healthy_P60_AAACCTGTCTCCGGTT-1,GSE162498,1054,502,adjacent healthy,healthy,P60,GSE162498_healthy_P60,21.157495,4,4,lung,healthy,,GSE162498_healthy_P60
GSE162498_Healthy_P60_AAACGGGCAATAGAGT-1,GSE162498,1262,718,adjacent healthy,healthy,P60,GSE162498_healthy_P60,13.946117,4,4,lung,healthy,,GSE162498_healthy_P60
GSE162498_Healthy_P60_AAACGGGGTTGTGGCC-1,GSE162498,1373,790,adjacent healthy,healthy,P60,GSE162498_healthy_P60,13.692644,4,4,lung,healthy,,GSE162498_healthy_P60
GSE162498_Healthy_P60_AAAGATGCAGTCAGAG-1,GSE162498,853,520,adjacent healthy,healthy,P60,GSE162498_healthy_P60,13.130129,4,4,lung,healthy,,GSE162498_healthy_P60
GSE162498_Healthy_P60_AAAGATGGTCAAAGCG-1,GSE162498,583,358,adjacent healthy,healthy,P60,GSE162498_healthy_P60,1.02916,6,6,lung,healthy,,GSE162498_healthy_P60


In [35]:
#exclude any samples with <100 cells
table(LUNG$integration_id)
#none to exclude 
#BRE <- subset(BRE, !(subset = integration_id %in% c("")))
#table(BRE$integration_id)


GSE162498_healthy_P60 GSE162498_healthy_P61   GSE162498_NSCLC_P34 
                  809                  1215                   482 
  GSE162498_NSCLC_P35   GSE162498_NSCLC_P42   GSE162498_NSCLC_P43 
                  685                  4404                  1868 
  GSE162498_NSCLC_P46   GSE162498_NSCLC_P47   GSE162498_NSCLC_P55 
                  999                  7144                  1579 
  GSE162498_NSCLC_P57   GSE162498_NSCLC_P58   GSE162498_NSCLC_P60 
                 5809                   680                  1845 
  GSE162498_NSCLC_P61 
                 1262 

In [36]:
#join layers and then split them by integration_id
Layers(LUNG[["RNA"]])
#join layers
LUNG[["RNA"]] <- JoinLayers(LUNG[["RNA"]])
Layers(LUNG[["RNA"]])
#split layers
LUNG[["RNA"]] <- split(LUNG[["RNA"]], f = LUNG$integration_id)
Layers(LUNG[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [38]:
#record number of cells
table(LUNG$integration_id)
LUNG
LUNG@project.name


GSE162498_healthy_P60 GSE162498_healthy_P61   GSE162498_NSCLC_P34 
                  809                  1215                   482 
  GSE162498_NSCLC_P35   GSE162498_NSCLC_P42   GSE162498_NSCLC_P43 
                  685                  4404                  1868 
  GSE162498_NSCLC_P46   GSE162498_NSCLC_P47   GSE162498_NSCLC_P55 
                  999                  7144                  1579 
  GSE162498_NSCLC_P57   GSE162498_NSCLC_P58   GSE162498_NSCLC_P60 
                 5809                   680                  1845 
  GSE162498_NSCLC_P61 
                 1262 

An object of class Seurat 
45068 features across 28781 samples within 1 assay 
Active assay: RNA (45068 features, 2000 variable features)
 27 layers present: counts.GSE162498_healthy_P60, counts.GSE162498_healthy_P61, counts.GSE162498_NSCLC_P34, counts.GSE162498_NSCLC_P35, counts.GSE162498_NSCLC_P42, counts.GSE162498_NSCLC_P43, counts.GSE162498_NSCLC_P46, counts.GSE162498_NSCLC_P47, counts.GSE162498_NSCLC_P55, counts.GSE162498_NSCLC_P57, counts.GSE162498_NSCLC_P58, counts.GSE162498_NSCLC_P60, counts.GSE162498_NSCLC_P61, scale.data, data.GSE162498_healthy_P60, data.GSE162498_healthy_P61, data.GSE162498_NSCLC_P34, data.GSE162498_NSCLC_P35, data.GSE162498_NSCLC_P42, data.GSE162498_NSCLC_P43, data.GSE162498_NSCLC_P46, data.GSE162498_NSCLC_P47, data.GSE162498_NSCLC_P55, data.GSE162498_NSCLC_P57, data.GSE162498_NSCLC_P58, data.GSE162498_NSCLC_P60, data.GSE162498_NSCLC_P61

In [39]:
#re-export seurat object ready for integration
saveRDS(LUNG, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE162498_myeloid_int.RDS")

In [40]:
#remove all objects in R
rm(list = ls())

## GSE131907

* note: these were originally labelled LUAD but in paper they say that LUAD is the most common subtype of NSCLC, so to be consistent with other datasets will define cancer_subtype here as MSCLC

In [41]:
LUNG <- readRDS("/scratch/user/s4436039/scdata/Myeloid_Cells/GSE131907_myeloid.RDS")

In [43]:
LUNG
LUNG@project.name
head(LUNG@meta.data)

An object of class Seurat 
29634 features across 36524 samples within 1 assay 
Active assay: RNA (29634 features, 2000 variable features)
 87 layers present: counts.1, counts.2, counts.3, counts.4, counts.5, counts.6, counts.7, counts.8, counts.9, counts.10, counts.11, counts.12, counts.13, counts.14, counts.15, counts.16, counts.17, counts.18, counts.19, counts.20, counts.21, counts.22, counts.23, counts.24, counts.25, counts.26, counts.27, counts.28, counts.29, counts.30, counts.31, counts.32, counts.33, counts.34, counts.35, counts.36, counts.37, counts.38, counts.39, counts.40, counts.41, counts.42, counts.43, data.1, data.2, data.3, data.4, data.5, data.6, data.7, data.8, data.9, data.10, data.11, data.12, data.13, data.14, data.15, data.16, data.17, data.18, data.19, data.20, data.21, data.22, data.23, data.24, data.25, data.26, data.27, data.28, data.29, data.30, data.31, data.32, data.33, data.34, data.35, data.36, data.37, data.38, data.39, data.40, data.41, data.42, data.43, 

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_meta,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>,<fct>,<fct>
GSE131907_LUAD_Tu_T0006_AAACCTGAGTTGCAGG_LUNG_T06,GSE131907,1695,711,LUNG_T06,tumour,LUAD,Pt_0006,GSE131907_LUAD_T0006,9.498525,1,1
GSE131907_LUAD_Tu_T0006_AAACCTGTCCAGAAGG_LUNG_T06,GSE131907,9826,2260,LUNG_T06,tumour,LUAD,Pt_0006,GSE131907_LUAD_T0006,2.869937,1,1
GSE131907_LUAD_Tu_T0006_AAAGATGTCTCATTCA_LUNG_T06,GSE131907,13178,3079,LUNG_T06,tumour,LUAD,Pt_0006,GSE131907_LUAD_T0006,4.932463,1,1
GSE131907_LUAD_Tu_T0006_AAAGCAAGTAATTGGA_LUNG_T06,GSE131907,6779,1826,LUNG_T06,tumour,LUAD,Pt_0006,GSE131907_LUAD_T0006,4.912229,1,1
GSE131907_LUAD_Tu_T0006_AAATGCCCATTACGAC_LUNG_T06,GSE131907,24381,4356,LUNG_T06,tumour,LUAD,Pt_0006,GSE131907_LUAD_T0006,4.335343,1,1
GSE131907_LUAD_Tu_T0006_AAATGCCTCGTGGGAA_LUNG_T06,GSE131907,9638,2226,LUNG_T06,tumour,LUAD,Pt_0006,GSE131907_LUAD_T0006,3.662586,1,1


In [45]:
table(LUNG$sample_type)
table(LUNG$cancer_type)
table(LUNG$patient_id)
table(LUNG$sample_id)


  brain mets Healthy Lung      LN mets       tumour 
        5405        16531         4953         9635 


Healthy    LUAD 
  16531   19993 


Pt_0001 Pt_0006 Pt_0008 Pt_0009 Pt_0018 Pt_0019 Pt_0020 Pt_0025 Pt_0028 Pt_0030 
   1311    1825    1581    2608    3021    2302    4368     581    1760    1890 
Pt_0031 Pt_0034 Pt_1006 Pt_1010 Pt_1011 Pt_1012 Pt_1013 Pt_1015 Pt_1019 Pt_1028 
   1737    2016     368     766     732    2062     138     310     322     100 
Pt_1049 Pt_1051 Pt_1058 Pt_3002 Pt_3003 Pt_3004 Pt_3006 Pt_3007 Pt_3012 Pt_3013 
    272     623     426     812     101     328     433     273     259     552 
Pt_3016 Pt_3017 Pt_3019 
    177     695    1775 


GSE131907_Healthy_N0001 GSE131907_Healthy_N0006 GSE131907_Healthy_N0008 
                   1311                    1272                    1324 
GSE131907_Healthy_N0009 GSE131907_Healthy_N0018 GSE131907_Healthy_N0019 
                   1144                    2050                    1144 
GSE131907_Healthy_N0020 GSE131907_Healthy_N0028 GSE131907_Healthy_N0030 
                   3489                     783                     955 
GSE131907_Healthy_N0031 GSE131907_Healthy_N0034    GSE131907_LUAD_B3002 
                   1337                    1722                     812 
   GSE131907_LUAD_B3003    GSE131907_LUAD_B3004    GSE131907_LUAD_B3006 
                    101                     328                     433 
   GSE131907_LUAD_B3007    GSE131907_LUAD_B3012    GSE131907_LUAD_B3013 
                    273                     259                     552 
   GSE131907_LUAD_B3016    GSE131907_LUAD_B3017    GSE131907_LUAD_B3019 
                    177                     695   

In [47]:
#split by sample_type
LUNG_B <- subset(LUNG, subset = sample_type %in% c("brain mets"))
LUNG_H <- subset(LUNG, subset = sample_type %in% c("Healthy Lung"))
LUNG_LN <- subset(LUNG, subset = sample_type %in% c("LN mets"))
LUNG_T <- subset(LUNG, subset = sample_type %in% c("tumour"))

#set site metadata
LUNG_B@meta.data$site <- "brain"
LUNG_H@meta.data$site <- "lung"
LUNG_LN@meta.data$site <- "lymph node"
LUNG_T@meta.data$site <- "lung"

#set sample_type_major metadata
LUNG_B@meta.data$sample_type_major <- "metastatic tumour"
LUNG_H@meta.data$sample_type_major <- "healthy"
LUNG_LN@meta.data$sample_type_major <- "metastatic tumour"
LUNG_T@meta.data$sample_type_major <- "primary tumour"

#set cancer_subtype metadata
LUNG_B@meta.data$cancer_subtype <- "NSCLC"
LUNG_H@meta.data$cancer_subtype <- "NA"
LUNG_LN@meta.data$cancer_subtype <- "NSCLC"
LUNG_T@meta.data$cancer_subtype <- "NSCLC"

#merge back together 
LUNG <- merge(LUNG_B, y = c(LUNG_H, LUNG_LN, LUNG_T), project = "GSE131907")

# up to here

In [None]:
#set integration_id metadata
LUNG@meta.data$integration_id <- LUNG@meta.data$sample_id

In [None]:
LUNG
LUNG@project.name
head(LUNG@meta.data)

An object of class Seurat 
45068 features across 28781 samples within 1 assay 
Active assay: RNA (45068 features, 2000 variable features)
 28 layers present: counts.10.1, counts.12.1, data.10.1, data.12.1, scale.data.1, counts.1.2, counts.2.2, counts.3.2, counts.4.2, counts.5.2, counts.6.2, counts.7.2, counts.8.2, counts.9.2, counts.11.2, counts.13.2, data.1.2, data.2.2, data.3.2, data.4.2, data.5.2, data.6.2, data.7.2, data.8.2, data.9.2, data.11.2, data.13.2, scale.data.2

Unnamed: 0_level_0,orig.ident,nCount_RNA,nFeature_RNA,sample_type,cancer_type,patient_id,sample_id,percent.mt,RNA_snn_res.0.2,seurat_clusters,site,sample_type_major,cancer_subtype,integration_id
Unnamed: 0_level_1,<chr>,<dbl>,<int>,<chr>,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>
GSE162498_Healthy_P60_AAACCTGGTTCCACTC-1,GSE162498,1608,763,adjacent healthy,healthy,P60,GSE162498_healthy_P60,9.328358,4,4,lung,healthy,,GSE162498_healthy_P60
GSE162498_Healthy_P60_AAACCTGTCTCCGGTT-1,GSE162498,1054,502,adjacent healthy,healthy,P60,GSE162498_healthy_P60,21.157495,4,4,lung,healthy,,GSE162498_healthy_P60
GSE162498_Healthy_P60_AAACGGGCAATAGAGT-1,GSE162498,1262,718,adjacent healthy,healthy,P60,GSE162498_healthy_P60,13.946117,4,4,lung,healthy,,GSE162498_healthy_P60
GSE162498_Healthy_P60_AAACGGGGTTGTGGCC-1,GSE162498,1373,790,adjacent healthy,healthy,P60,GSE162498_healthy_P60,13.692644,4,4,lung,healthy,,GSE162498_healthy_P60
GSE162498_Healthy_P60_AAAGATGCAGTCAGAG-1,GSE162498,853,520,adjacent healthy,healthy,P60,GSE162498_healthy_P60,13.130129,4,4,lung,healthy,,GSE162498_healthy_P60
GSE162498_Healthy_P60_AAAGATGGTCAAAGCG-1,GSE162498,583,358,adjacent healthy,healthy,P60,GSE162498_healthy_P60,1.02916,6,6,lung,healthy,,GSE162498_healthy_P60


In [None]:
#exclude any samples with <100 cells
table(LUNG$integration_id)
#none to exclude 
#BRE <- subset(BRE, !(subset = integration_id %in% c("")))
#table(BRE$integration_id)


GSE162498_healthy_P60 GSE162498_healthy_P61   GSE162498_NSCLC_P34 
                  809                  1215                   482 
  GSE162498_NSCLC_P35   GSE162498_NSCLC_P42   GSE162498_NSCLC_P43 
                  685                  4404                  1868 
  GSE162498_NSCLC_P46   GSE162498_NSCLC_P47   GSE162498_NSCLC_P55 
                  999                  7144                  1579 
  GSE162498_NSCLC_P57   GSE162498_NSCLC_P58   GSE162498_NSCLC_P60 
                 5809                   680                  1845 
  GSE162498_NSCLC_P61 
                 1262 

In [None]:
#join layers and then split them by integration_id
Layers(LUNG[["RNA"]])
#join layers
LUNG[["RNA"]] <- JoinLayers(LUNG[["RNA"]])
Layers(LUNG[["RNA"]])
#split layers
LUNG[["RNA"]] <- split(LUNG[["RNA"]], f = LUNG$integration_id)
Layers(LUNG[["RNA"]])


Splitting ‘counts’, ‘data’ layers. Not splitting ‘scale.data’. If you would like to split other layers, set in `layers` argument.



In [None]:
#record number of cells
table(LUNG$integration_id)
LUNG
LUNG@project.name


GSE162498_healthy_P60 GSE162498_healthy_P61   GSE162498_NSCLC_P34 
                  809                  1215                   482 
  GSE162498_NSCLC_P35   GSE162498_NSCLC_P42   GSE162498_NSCLC_P43 
                  685                  4404                  1868 
  GSE162498_NSCLC_P46   GSE162498_NSCLC_P47   GSE162498_NSCLC_P55 
                  999                  7144                  1579 
  GSE162498_NSCLC_P57   GSE162498_NSCLC_P58   GSE162498_NSCLC_P60 
                 5809                   680                  1845 
  GSE162498_NSCLC_P61 
                 1262 

An object of class Seurat 
45068 features across 28781 samples within 1 assay 
Active assay: RNA (45068 features, 2000 variable features)
 27 layers present: counts.GSE162498_healthy_P60, counts.GSE162498_healthy_P61, counts.GSE162498_NSCLC_P34, counts.GSE162498_NSCLC_P35, counts.GSE162498_NSCLC_P42, counts.GSE162498_NSCLC_P43, counts.GSE162498_NSCLC_P46, counts.GSE162498_NSCLC_P47, counts.GSE162498_NSCLC_P55, counts.GSE162498_NSCLC_P57, counts.GSE162498_NSCLC_P58, counts.GSE162498_NSCLC_P60, counts.GSE162498_NSCLC_P61, scale.data, data.GSE162498_healthy_P60, data.GSE162498_healthy_P61, data.GSE162498_NSCLC_P34, data.GSE162498_NSCLC_P35, data.GSE162498_NSCLC_P42, data.GSE162498_NSCLC_P43, data.GSE162498_NSCLC_P46, data.GSE162498_NSCLC_P47, data.GSE162498_NSCLC_P55, data.GSE162498_NSCLC_P57, data.GSE162498_NSCLC_P58, data.GSE162498_NSCLC_P60, data.GSE162498_NSCLC_P61

In [None]:
#re-export seurat object ready for integration
saveRDS(LUNG, "/scratch/user/s4436039/scdata/Myeloid_Cells/Myeloid_Cells_Integrate/GSE162498_myeloid_int.RDS")

In [None]:
#remove all objects in R
rm(list = ls())