-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
combo_prep.py running out of memory #6
Comments
Hi Tom,
Yeah, I think as it stands you'd have to subdivide the linkage groups.
combo_prep.py is just a lightly enhanced version of msmc-tools'
generate_multihetstep.py, so you could try running generate_multihetstep.py
to see if it's something specific to the added code that's eating up all
the memory or whether it's just the amount of data.
Sorry not to have a better answer!
Daniel
…On Thu, Dec 10, 2020 at 7:16 PM tomoosting ***@***.***> wrote:
Hi,
I am trying to run combo_prep.py with 350 whole-genome sequences.
The program keeps running out of memory (400Gb ram), and I'm pretty much
pushing the limits of our system.
Looking online, I see suggestions for trying the batch process data into
python.
Is there such an option for running this script with batch processing, or
would I have to subdivide my linkage groups in order to read in the data?
I was assuming I'd have to analyse all samples for one genomics region in
a single analyses.
Any advice on how to best approach an analyses of this many samples is
very welcome and much appreciated.
Many thanks,
Tom
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#6>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACDUUQFBWYTH5BD2ODSTANDSUFQFXANCNFSM4UVZSQVA>
.
|
Hi Daniel, Will give that a try, thanks for the reply! Cheers, |
Hi Daniel, I’ve reopened this issue as I’ve done some more research into why my runs keep crashing. Even so that with 50 samples and a 50Kbp section my resource use reached over 200 GB of ram, giving me the following error: Traceback (most recent call last): I’d like to analyse close to 200 samples in a single run if possible, atleast 100. Any idea what might be going on? I've added the 50Kbp files of 50 samples to google drive. I’ve used the following syntax, following examples recommendations ():
Many thanks, |
Hmm, can you try just running msmc-tools' generate_multihetstep.py on the output from bcftools to check how the memory usage compares? This would help me narrow down what could be causing the problem. |
Aha, thanks, this is helpful. I'll see if I can track it down.
…On Thu, Jan 14, 2021 at 3:02 AM tomoosting ***@***.***> wrote:
Memory usage is even higher when I run msmc-tools multistep..
X axis is N samples and y axis is GB ram used.
Analyses was run on the same linkage group.
[image: afbeelding]
<https://user-images.githubusercontent.com/40846461/104560701-22e8a380-56ab-11eb-9d14-bcd4bea249b4.png>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#6 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACDUUQDMVQMIUU7BLKDSYLDSZ2QHZANCNFSM4UVZSQVA>
.
|
Hi,
I am trying to run combo_prep.py with 350 whole-genome sequences.
The program keeps running out of memory (400Gb ram), and I'm pretty much pushing the limits of our system.
Looking online, I see suggestions for trying the batch process data into python.
Is there such an option for running this script with batch processing, or would I have to subdivide my linkage groups in order to read in the data?
I was assuming I'd have to analyse all samples for one genomics region in a single analyses.
Any advice on how to best approach an analyses of this many samples is very welcome and much appreciated.
Many thanks,
Tom
The text was updated successfully, but these errors were encountered: