Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Error] std::bad_alloc #6

Closed
yassineS opened this issue May 14, 2018 · 6 comments
Closed

[Error] std::bad_alloc #6

yassineS opened this issue May 14, 2018 · 6 comments

Comments

@yassineS
Copy link

Hiya,

I am trying to run dystruct on a fairly large dataset, a 1200 individuals with 800k SNPs after LD pruning. I'm running this on a large maching 240G or RAM and with OMP_NUM_THREADS=62. After running for 2h or so, I got the following core dump:

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)
@tyjo
Copy link
Owner

tyjo commented May 14, 2018

Thanks for opening this issue. This looks like you a running out of memory somewhere. Does the program output anything besides the error? Can you include your system specifications (OS/compiler)? I wonder if you are hitting a resource limit set by the OS.

One thing to consider - you won't see a benefit in performance after the number of threads increases beyond the number of populations (K). Does decreasing the number of threads solve the issue?

@yassineS
Copy link
Author

Thanks for the quick feedback. I tried with OMP_NUM_THREADS=K, but it still failed the same way, and this is the only output I'm getting:

loading genotype matrix...
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)

I feel like I need to be more aggressive with my prunning (use a lower R^2). I will also try to 'thin' the dataset a little. Unless you can think of an alternative approach.

@tyjo
Copy link
Owner

tyjo commented May 15, 2018

It looks like it is running out of memory when loading the data. However, if each SNP takes 4 bytes then 800k SNPs in 1200 individuals should only use approximately (4 x 800k x 1200) / (1024^3) = 3.6 GB of memory, so I wonder if something else is going on here.

Do you mind uploading the header of your input file (just the generation times, not genotypes)? I will try to replicate using dummy data. If Dystruct is allocating more memory than it should, I will upload a fix.

Depending on the number of time points in your data, 800K SNPs might be a bit large to run in a reasonable amount of time.

@yassineS
Copy link
Author

yassineS commented May 18, 2018 via email

@tyjo
Copy link
Owner

tyjo commented May 18, 2018

Both our model and ADMIXTURE assume unlinked sites, so you want to do some amount of LD pruning. I think similar LD thresholds for thinning SNPs for ADMIXTURE should work with our model too. Although, there is a trade-off because removing sites with high LD disproportionately affects individuals with missing data. Section 6 in the supplementary information of this paper briefly discussions LD pruning for ADMIXTURE and ancient DNA.

I generated dummy data using your headers, but I couldn't replicate the memory problem. On my machine 800,000 SNPs took up 1768MB while loading the data, which increased to 3051MB during the inference algorithm. Just in case there is a problem with the input file, I uploaded code that runs a few more checks in the input file.

@yassineS
Copy link
Author

Thank you Tyler for your help. I will investigate my setup and input file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants