Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pcrelate: error with rbindlist() #108

Open
AmandaHWChong opened this issue Jan 26, 2024 · 1 comment
Open

pcrelate: error with rbindlist() #108

AmandaHWChong opened this issue Jan 26, 2024 · 1 comment

Comments

@AmandaHWChong
Copy link

AmandaHWChong commented Jan 26, 2024

Hi,

I am currently having an issue with running pcrelate() due to, potentially, having a large sample size of 140K as it shows this error:

Error in rbindlist(l, use.names, fill, idcol) :
Total rows in the list is 2203757913 which is larger than the maximum number of rows, currently 2147483647
Calls: pcrelate ... pcrelate -> .local -> .pcrelate -> rbind -> rbind -> rbindlist

I have tried to increase my sample block size, but is there a sample size limit to the pcrelate function?

Below is my code:
genoData_pruned <- GenotypeBlockIterator(genoData, snpInclude=pruned)
mypcrelate <- pcrelate(genoData_pruned, mypcair$vectors[,1:npca], training.set = mypcair$unrels, sample.block.size=10000, BPPARAM = BiocParallel::SerialParam())

Using 1 CPU cores
140831 samples to be included in the analysis, split into 15 blocks...
Using 1 CPU cores
Betas for 7 PC(s) will be calculated using 93272 samples in training.set...
Calculating Indivdiual-Specific Allele Frequency betas for 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (1,1)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (1,2)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (1,3)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (1,4)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (1,5)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (1,6)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (1,7)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (1,8)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (1,9)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (1,10)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (1,11)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (1,12)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (1,13)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (1,14)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (1,15)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (2,2)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (2,3)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (2,4)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (2,5)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (2,6)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (2,7)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (2,8)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (2,9)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (2,10)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (2,11)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Running PC-Relate analysis for sample block pair (2,12)
Using 1 CPU cores
Running PC-Relate analysis using 280707 SNPs in 29 blocks...
Error in rbindlist(l, use.names, fill, idcol) :
Total rows in the list is 2203757913 which is larger than the maximum number of rows, currently 2147483647
Calls: pcrelate ... pcrelate -> .local -> .pcrelate -> rbind -> rbind -> rbindlist
Execution halted

Thank you for your help!

@smgogarten
Copy link
Collaborator

With this many samples, I think you need to run the components of pcrelate separately, and filter the results to a kinship threshold prior to combining sample blocks. See this comment for a walkthrough of how to do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants