Skip to content

aggregate_cells takes too long #110

@MaximilianNuber

Description

@MaximilianNuber

Dear Dr. Mangiola,

Thank you for the very nice package. I am working with large scale single cell RNA seq data and wnat to use tidySingleCellExperiment.
I discovered that aggregate_cells takes very long, as compared to aggregateAcrossCells.

As I am usually working on a server, I recreated the problem with a 225k cell dataset on my laptop:
https://cellxgene.cziscience.com/e/dea717d4-7bc0-4e46-950f-fd7e1cc8df7d.cxg/

require(tidySingleCellExperiment)
require(tidySummarizedExperiment)
#setwd("/Users/maximiliannuber/Documents/CSAMA_2024")
sce <- readr::read_rds("Seurat_kidney.rds")
sce <- as.SingleCellExperiment(sce)

aggregateAcrossCells runs fast:

system.time(pbulk <- aggregateAcrossCells(sce, ids = colData(sce)[, c("donor_id", "cell_type")]))
 user  system elapsed 
 11.690   2.481  16.056 

This code ran very long and I interrupted after about 10 minutes.

system.time(pbulk <- aggregateAcrossCells(sce, ids = colData(sce)[, c("donor_id", "cell_type")]))

I looked at this with Michael Love, and we found this may be an issue with the combination of donor and cell type.
This code took just a few seconds:

system.time(
        
        pbulk <- sce %>% 
        aggregate_cells(cell_type, assays="counts")
        
        )
 user  system elapsed 
 10.164   2.333  13.953 

Thank you for any help!

output of sessionInfo:

R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.2.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Rome
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tidySummarizedExperiment_1.14.0 ttservice_0.4.1                
 [3] tidyr_1.3.1                     tidySingleCellExperiment_1.14.0
 [5] muscData_1.18.0                 ExperimentHub_2.12.0           
 [7] AnnotationHub_3.12.0            BiocFileCache_2.12.0           
 [9] dbplyr_2.5.0                    rpx_2.12.0                     
[11] edgeR_4.2.0                     stringr_1.5.1                  
[13] pheatmap_1.0.12                 celldex_1.14.0                 
[15] SingleR_2.6.0                   igraph_2.0.3                   
[17] GGally_2.2.1                    NewWave_1.14.0                 
[19] scry_1.16.0                     scDblFinder_1.18.0             
[21] scran_1.32.0                    scater_1.32.0                  
[23] ggplot2_3.5.1                   EnsDb.Hsapiens.v86_2.99.0      
[25] ensembldb_2.28.0                AnnotationFilter_1.28.0        
[27] GenomicFeatures_1.56.0          AnnotationDbi_1.66.0           
[29] scuttle_1.14.0                  DropletUtils_1.24.0            
[31] SingleCellExperiment_1.26.0     SummarizedExperiment_1.34.0    
[33] GenomicRanges_1.56.0            GenomeInfoDb_1.40.0            
[35] IRanges_2.38.0                  S4Vectors_0.42.0               
[37] MatrixGenerics_1.16.0           matrixStats_1.3.0              
[39] DropletTestFiles_1.14.0         dplyr_1.1.4                    
[41] limma_3.60.3                    RcppSpdlog_0.0.17              
[43] Seurat_5.0.3                    cellxgene.census_1.14.1        
[45] SeuratObject_5.0.1              sp_2.1-4                       
[47] GEOquery_2.72.0                 Biobase_2.64.0                 
[49] BiocGenerics_0.50.0            

loaded via a namespace (and not attached):
  [1] R.methodsS3_1.8.2         vroom_1.6.5               RcppCCTZ_0.2.12          
  [4] spdl_0.0.5                goftest_1.2-3             Biostrings_2.72.1        
  [7] HDF5Array_1.32.0          vctrs_0.6.5               spatstat.random_3.2-3    
 [10] digest_0.6.35             png_0.1-8                 aws.signature_0.6.0      
 [13] gypsum_1.0.1              tiledb_0.27.0             ggrepel_0.9.5            
 [16] deldir_2.0-4              parallelly_1.37.1         MASS_7.3-60.2            
 [19] reshape2_1.4.4            httpuv_1.6.15             withr_3.0.0              
 [22] xfun_0.43                 aws.s3_0.3.21             ellipsis_0.3.2           
 [25] survival_3.5-8            memoise_2.0.1             ggbeeswarm_0.7.2         
 [28] zoo_1.8-12                pbapply_1.7-2             R.oo_1.26.0              
 [31] KEGGREST_1.44.1           promises_1.3.0            httr_1.4.7               
 [34] restfulr_0.0.15           globals_0.16.3            fitdistrplus_1.1-11      
 [37] rhdf5filters_1.16.0       ps_1.7.6                  rhdf5_2.48.0             
 [40] rstudioapi_0.16.0         nanotime_0.3.7            UCSC.utils_1.0.0         
 [43] miniUI_0.1.1.1            generics_0.1.3            processx_3.8.4           
 [46] base64enc_0.1-3           curl_5.2.1                zlibbioc_1.50.0          
 [49] ScaledMatrix_1.12.0       polyclip_1.10-6           glmpca_0.2.0             
 [52] GenomeInfoDbData_1.2.12   SparseArray_1.4.3         desc_1.4.3               
 [55] xtable_1.8-4              evaluate_0.23             S4Arrays_1.4.0           
 [58] hms_1.1.3                 irlba_2.3.5.1             colorspace_2.1-0         
 [61] filelock_1.0.3            ROCR_1.0-11               reticulate_1.36.1        
 [64] spatstat.data_3.0-4       magrittr_2.0.3            lmtest_0.9-40            
 [67] readr_2.1.5               nanoarrow_0.4.0.1         later_1.3.2              
 [70] viridis_0.6.5             lattice_0.22-6            spatstat.geom_3.2-9      
 [73] future.apply_1.11.2       scattermore_1.2           XML_3.99-0.16.1          
 [76] triebeard_0.4.1           cowplot_1.1.3             RcppAnnoy_0.0.22         
 [79] pillar_1.9.0              nlme_3.1-164              sna_2.7-2                
 [82] compiler_4.4.0            beachmat_2.20.0           RSpectra_0.16-1          
 [85] stringi_1.8.3             tensor_1.5                GenomicAlignments_1.40.0 
 [88] plyr_1.8.9                crayon_1.5.2              abind_1.4-5              
 [91] BiocIO_1.14.0             locfit_1.5-9.9            bit_4.0.5                
 [94] codetools_0.2-20          BiocSingular_1.20.0       alabaster.ranges_1.4.1   
 [97] plotly_4.10.4             mime_0.12                 intergraph_2.0-4         
[100] splines_4.4.0             Rcpp_1.0.12               fastDummies_1.7.3        
[103] sparseMatrixStats_1.16.0  knitr_1.46                blob_1.2.4               
[106] utf8_1.2.4                BiocVersion_3.19.1        fs_1.6.4                 
[109] listenv_0.9.1             DelayedMatrixStats_1.26.0 pkgbuild_1.4.4           
[112] tibble_3.2.1              Matrix_1.7-0              callr_3.7.6              
[115] statmod_1.5.0             tzdb_0.4.0                network_1.18.2           
[118] pkgconfig_2.0.3           tools_4.4.0               cachem_1.0.8             
[121] RSQLite_2.3.7             viridisLite_0.4.2         DBI_1.2.2                
[124] fastmap_1.1.1             rmarkdown_2.26            scales_1.3.0             
[127] grid_4.4.0                ica_1.0-3                 Rsamtools_2.20.0         
[130] coda_0.19-4.1             patchwork_1.2.0           ggstats_0.6.0            
[133] BiocManager_1.30.23       dotCall64_1.1-1           alabaster.schemas_1.4.0  
[136] RANN_2.6.1                farver_2.1.1              yaml_2.3.8               
[139] rtracklayer_1.64.0        cli_3.6.2                 purrr_1.0.2              
[142] leiden_0.4.3.1            lifecycle_1.0.4           uwot_0.2.2               
[145] arrow_16.1.0              bluster_1.14.0            BiocParallel_1.38.0      
[148] gtable_0.3.5              rjson_0.2.21              ggridges_0.5.6           
[151] progressr_0.14.0          parallel_4.4.0            jsonlite_1.8.8           
[154] RcppHNSW_0.6.0            bitops_1.0-7              bit64_4.0.5              
[157] assertthat_0.2.1          xgboost_1.7.7.1           Rtsne_0.17               
[160] alabaster.matrix_1.4.1    spatstat.utils_3.0-4      BiocNeighbors_1.22.0     
[163] urltools_1.7.3            alabaster.se_1.4.1        metapod_1.12.0           
[166] dqrng_0.3.2               R.utils_2.12.3            alabaster.base_1.4.1     
[169] lazyeval_0.2.2            shiny_1.8.1.1             htmltools_0.5.8.1        
[172] sctransform_0.4.1         rappdirs_0.3.3            glue_1.7.0               
[175] spam_2.10-0               httr2_1.0.1               XVector_0.44.0           
[178] RCurl_1.98-1.14           gridExtra_2.3             tiledbsoma_1.11.1        
[181] R6_2.5.1                  DESeq2_1.44.0             labeling_0.4.3           
[184] SharedObject_1.18.0       cluster_2.1.6             pkgload_1.3.4            
[187] Rhdf5lib_1.26.0           statnet.common_4.9.0      DelayedArray_0.30.1      
[190] tidyselect_1.2.1          vipor_0.4.7               ProtGenerics_1.36.0      
[193] xml2_1.3.6                future_1.33.2             rsvd_1.0.5               
[196] munsell_0.5.1             KernSmooth_2.23-22        data.table_1.15.4        
[199] htmlwidgets_1.6.4         RColorBrewer_1.1-3        rlang_1.1.3              
[202] spatstat.sparse_3.0-3     spatstat.explore_3.2-7    remotes_2.5.0            
[205] fansi_1.0.6               beeswarm_0.4.0    

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions