Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Error with Clustering with Leiden algorithm matrix - When to use matrix vs igraph method? #155

Closed
WilliamMWei opened this issue Nov 7, 2023 · 8 comments

Comments

@WilliamMWei
Copy link

Hi,

Thanks for the tool.

I attempted to cluster 45,000 cells using Leiden algorithm, using default argument method = "matrix". However, I encountered a "memory issue". But. when I changed `method = "igraph", it ran fine.

In the help, it mentions to use igraph method when we do not want to cast large dataset to dense matrix, so it seems it simply is to deal with large dataset. But, would you mind letting me know if there is other key difference between using igraph vs matrix methods in terms of the clustering results? And, when should I choose one vs the other?

Related post: scverse/scanpy#1053
I have also posted here: satijalab/seurat#7979

Thank you so much for your support!

 pbmc_cd4_cxcr5posneg.data_filtergene_filtercell_list_IndividualDatasetMERGED <- Seurat::FindClusters(pbmc_cd4_cxcr5posneg.data_filtergene_filtercell_list_IndividualDatasetMERGED, algorithm = 4, resolution = 1.2)
Error in py_call_impl(callable, call_args$unnamed, call_args$named) : 
  MemoryError
Run `reticulate::py_last_error()` for details.
In addition: There were 12 warnings (use warnings() to see them)

> reticulate::py_last_error()

── Python Exception Message ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
MemoryError

── R Traceback ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
     ▆
  1. ├─Seurat::FindClusters(...)
  2. └─Seurat:::FindClusters.Seurat(...)
  3.   ├─Seurat::FindClusters(...)
  4.   └─Seurat:::FindClusters.default(...)
  5.     └─Seurat:::RunLeiden(...)
  6.       ├─leiden::leiden(...)
  7.       └─leiden:::leiden.matrix(...)
  8.         ├─leiden:::make_py_graph(object, weights = weights)
  9.         └─leiden:::make_py_graph.matrix(object, weights = weights)
 10.           ├─leiden:::make_py_object(object, weights = weights)
 11.           └─leiden:::make_py_object.matrix(object, weights = weights)
 12.             └─adj_mat_py$tolist()
 13.               └─reticulate:::py_call_impl(callable, call_args$unnamed, call_args$named)

> sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8    LC_MONETARY=English_United Kingdom.utf8
[4] LC_NUMERIC=C                            LC_TIME=English_United Kingdom.utf8    

time zone: Europe/London
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] clustree_0.5.0     ggraph_2.1.0       ggplot2_3.4.4      reticulate_1.34.0  knitr_1.44         SeuratObject_4.1.4 Seurat_4.4.0      

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3     rstudioapi_0.15.0      jsonlite_1.8.7         magrittr_2.0.3         spatstat.utils_3.0-3   farver_2.1.1          
  [7] rmarkdown_2.25         fs_1.6.3               vctrs_0.6.4            ROCR_1.0-11            memoise_2.0.1          spatstat.explore_3.2-5
 [13] rstatix_0.7.2          htmltools_0.5.6.1      usethis_2.2.2          broom_1.0.5            sctransform_0.4.1      parallelly_1.36.0     
 [19] KernSmooth_2.23-21     htmlwidgets_1.6.2      ica_1.0-3              plyr_1.8.9             plotly_4.10.3          zoo_1.8-12            
 [25] cachem_1.0.8           igraph_1.5.1           mime_0.12              lifecycle_1.0.3        pkgconfig_2.0.3        Matrix_1.6-1.1        
 [31] R6_2.5.1               fastmap_1.1.1          fitdistrplus_1.1-11    future_1.33.0          shiny_1.7.5.1          digest_0.6.33         
 [37] colorspace_2.1-0       patchwork_1.1.3        ps_1.7.5               rprojroot_2.0.3        tensor_1.5             irlba_2.3.5.1         
 [43] pkgload_1.3.3          ggpubr_0.6.0           labeling_0.4.3         progressr_0.14.0       fansi_1.0.5            spatstat.sparse_3.0-2 
 [49] httr_1.4.7             polyclip_1.10-6        abind_1.4-5            compiler_4.3.1         here_1.0.1             remotes_2.4.2.1       
 [55] withr_2.5.1            backports_1.4.1        viridis_0.6.4          carData_3.0-5          pkgbuild_1.4.2         ggforce_0.4.1         
 [61] ggsignif_0.6.4         MASS_7.3-60            rappdirs_0.3.3         sessioninfo_1.2.2      tools_4.3.1            lmtest_0.9-40         
 [67] httpuv_1.6.12          future.apply_1.11.0    goftest_1.2-3          glue_1.6.2             callr_3.7.3            nlme_3.1-162          
 [73] promises_1.2.1         grid_4.3.1             checkmate_2.2.0        Rtsne_0.16             cluster_2.1.4          reshape2_1.4.4        
 [79] generics_0.1.3         gtable_0.3.4           spatstat.data_3.0-3    tidyr_1.3.0            data.table_1.14.8      tidygraph_1.2.3       
 [85] sp_2.1-1               car_3.1-2              utf8_1.2.4             spatstat.geom_3.2-7    RcppAnnoy_0.0.21       ggrepel_0.9.4         
 [91] RANN_2.6.1             pillar_1.9.0           stringr_1.5.0          later_1.3.1            splines_4.3.1          tweenr_2.0.2          
 [97] dplyr_1.1.3            lattice_0.21-8         survival_3.5-5         deldir_1.0-9           tidyselect_1.2.0       miniUI_0.1.1.1        
[103] pbapply_1.7-2          gridExtra_2.3          scattermore_1.2        xfun_0.40              graphlayouts_1.0.1     devtools_2.4.5        
[109] matrixStats_1.0.0      stringi_1.7.12         lazyeval_0.2.2         yaml_2.3.7             evaluate_0.22          codetools_0.2-19      
[115] tibble_3.2.1           BiocManager_1.30.22    cli_3.6.1              uwot_0.1.16            xtable_1.8-4           munsell_0.5.0         
[121] processx_3.8.2         Rcpp_1.0.11            globals_0.16.2         spatstat.random_3.2-1  png_0.1-8              parallel_4.3.1        
[127] ellipsis_0.3.2         prettyunits_1.2.0      profvis_0.3.8          urlchecker_1.0.1       listenv_0.9.0          viridisLite_0.4.2     
[133] scales_1.2.1           ggridges_0.5.4         leiden_0.4.3           purrr_1.0.2            crayon_1.5.2           rlang_1.1.1           
[139] cowplot_1.1.1         

@denvercal1234GitHub

@szhorvat
Copy link
Contributor

szhorvat commented Nov 7, 2023

Using a matrix is not a feature of this library. It is entirely specific to the leiden R package, which will convert that matrix to a graph before doing any community detection.

Given what the leiden package does, the claim in Seurat's documentation that the "matrix" method is faster for small data seems rather strange ... maybe it has to do with inefficient transfer of data between R and Python.

@denvercal1234
Copy link

denvercal1234 commented Nov 7, 2023

Thanks @szhorvat -- just so I understand it correctly, did you mean specifying method="matrix" or method="igraph" does not really impact the resulting clusters but it is simply helpful for efficiency of how the data is processed before community detection? For example, with large dataset, specifying method=igraph will skip the conversion of the data to a dense matrix, thereby speeding up the whole clustering (community detection).

@szhorvat
Copy link
Contributor

szhorvat commented Nov 7, 2023

Presumably yes. But you need to discuss this with the packages that implemented these methods. This choice of methods does not come from the leidenalg Python package.

@denvercal1234
Copy link

Thanks Szabolcs. Hopefully someone from Seurat will give some input.

@SamGG
Copy link

SamGG commented Nov 7, 2023

Probably @TomKellyGenetics could bring some clues.

My opinion: maybe it's time to use igraph directly.
https://igraph.org/r/doc/cluster_leiden.html

@TomKellyGenetics
Copy link
Contributor

TomKellyGenetics commented Nov 9, 2023

My opinion: maybe it's time to use igraph directly.
https://igraph.org/r/doc/cluster_leiden.html

@SamGG the leiden package already does this by default for igraph objects, although limited parameters are supported compared to calling Python. This has been supported for over a year with the 0.4 version.

maybe it has to do with inefficient transfer of data between R and Python

@szhorvat that’s correct it does (reticulate supports dense matrices but not sparse matrices or igraph objects so igraph objects are passed as an edge list and recreated in Python). This only applies to older versions of the R package for the reasons discussed above so the comment in Seurat documentation is likely no longer relevant for users running igraph 1.2.7 and leiden 0.4.0 or later.

@SamGG
Copy link

SamGG commented Nov 9, 2023

Thanks for this information and your feedback.

@vtraag
Copy link
Owner

vtraag commented Dec 19, 2023

Thanks all for commenting in my absence! I believe all questions are addressed, so I'm closing this.

@vtraag vtraag closed this as completed Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants