Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IntegrateData() fails: Error in validityMethod(as(object, superClass)) #4

Closed
vertesy opened this issue Dec 9, 2020 · 5 comments
Closed
Labels

Comments

@vertesy
Copy link
Owner

vertesy commented Dec 9, 2020

Parameter Value
Date 08/12/2020
Time ~23:00
Queue m
Node ?
Memory requested ?1500
CPUs requested ?65
CPUs used 30
> workers
[1] 30
> tic(); combined.obj <- IntegrateData(anchorset = anchors, dims = 1:p$'n.CC'); toc(); say()
Finding integration vector weights
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Integrating data
Merging dataset 17 into 22 33
Extracting anchors for merged samples
Finding integration vectors
Finding integration vector weights
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|

...

Integrating data
Merging dataset 22 33 17 36 37 35 38 39 40 1 into 6 15 18 19 14 20 7 8 10 11 9
Extracting anchors for merged samples
Finding integration vectors
Finding integration vector weights
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Integrating data
Merging dataset 24 41 23 31 29 30 28 3 21 32 25 26 27 12 13 16 into 6 15 18 19 14 20 7 8 10 11 9 22 33 17 36 37 35 38 39 40 1
Extracting anchors for merged samples
Finding integration vectors
Error in validityMethod(as(object, superClass)) :
  long vectors not supported yet: ../../src/include/Rinlinedfuns.h:535
In addition: There were 50 or more warnings (use warnings() to see the first 50)

pipe broke overnight after this, so cannot see warnings().

@vertesy
Copy link
Owner Author

vertesy commented Dec 9, 2020

Relevant issues:

satijalab/seurat#2063

I think the issue here is the use of large numbers of genes for features.to.integrate. This creates a non-sparse matrix for all genes, and is infeasible for any method - its not a specific problem with the Seurat alignment workflow. We do not suggest batch-correcting all genes, only ones that exhibit variation across single-cells, which are informative for downstream clustering analyses.

satijalab/seurat#1029

Thanks for the question - we've explored this and the cause is that there are so many anchors, that it creates a sparse matrix with >2^31 elements in R, which can throw an error.

This happens to me when I give a large number of genes for features.to.integrate.

I don't think this is Seurat's problem, but the problem with Matrix, which still doesn't support vectors with more than 2^31 elements. It's just that a sparse matrix with too many non-zero elements is produced. This can be worked around by using the sparse matrix package spam64, but will require changes to Seurat's source code. Actually supporting long vectors is on the to do list of Matrix developers, but somehow they still haven't implemented it.

@vertesy vertesy added the Seurat label Dec 9, 2020
@vertesy
Copy link
Owner Author

vertesy commented Dec 9, 2020

Suggestions

  1. Make 1 dataset reference
    Add reference = 1 to anchors <- FindIntegrationAnchors(seus, normalization.method = "SCT", anchor.features = features_use, reference = 1) .
  2. Use RPCA -> see error in Error: Failed to retrieve the result of MulticoreFuture (future_lapply-21) in FindIntegrationAnchors #8
  3. Decrease the number of genes for features.to.integrate. (debated, error at 1000) -> will try
  4. Not specifying anything for dims or features.to.integrate.
    IntegrateData( anchorset = scData.Anchors )

@vertesy
Copy link
Owner Author

vertesy commented Apr 14, 2021

  1. makes an invalid assumption to our analysis → NO
  2. rPCA finally worked, but it was very tough to get it run
  3. "Decrease the number of genes" did not solve it → NO
  4. Defaults did not solve it → NO

@vertesy vertesy closed this as completed Apr 14, 2021
@aelhossiny
Copy link

Hi, I am facing the same problem, my dataset is around 122k cells from 32 samples. Both methods fail (CCA and rPCA) when it comes to integratedata() step. I tried as low as 1k variable features but it's not working. How did you get rPCA method to work?

Note: both methods work fine when integrating using the previous normalization methods (log2 norm)

@tinakeshav
Copy link

Hey @vertesy , how did you get rPCA to run with SCT normalization? I've been struggling with this for weeks now, would infinitely appreciate any input

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants