Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimum RAM requirements? #225

Closed
jzrapp opened this issue Jul 19, 2019 · 2 comments
Closed

Minimum RAM requirements? #225

jzrapp opened this issue Jul 19, 2019 · 2 comments

Comments

@jzrapp
Copy link

jzrapp commented Jul 19, 2019

Hi @voutcn,
is there a way to estimate the minimum mem requirement to process a dataset like this:
[read_lib_functions-inl.h : 209] Lib 0 (MetaGSB_022018_FD_input.corr.fastq.gz): interleaved, 216053580 reads, 150 max length
[read_lib_functions-inl.h : 209] Lib 1 (SB02metaG_FD_input.corr.fastq.gz): interleaved, 526835864 reads, 150 max length
[read_lib_functions-inl.h : 209] Lib 2 (SB12metaG_FD_input.corr.fastq.gz): interleaved, 362498508 reads, 150 max length
[read_lib_functions-inl.h : 209] Lib 3 ( SI3LmetaG_FD_input.corr.fastq.gz): interleaved, 376801276 reads, 150 max length
[read_lib_functions-inl.h : 209] Lib 4 (SI3UmetaG_FD_input.corr.fastq.gz): interleaved, 589974120 reads, 150 max length

I've been trying to get this running on a new compute cluster for a while now, and it always crashed/is terminated by the system because of memory usage. My previous system had 500gb RAM, the one now has nodes with 128gb each.
I've tried different things with --mem-flag 0 and -m 0.25 or --mem-flag 0 and -m 0.85 (and others), but obviously (and maybe not surprisingly) one node doesn't seem sufficient.

Thanks for your help!

@voutcn
Copy link
Owner

voutcn commented Jul 25, 2019

Megahit will read all reads into memory for graph building. Minimum memory required can be calculated by: N / 4 + n * 16 + M, where N is the total number of bases of the input reads, n is the number of reads and M is the working memory for sorting k-mers, which is related to the distribution of the memory but usually much less than N / 4 + n * 16.

Try -m 0.99 --mem-flag 0 and if it does not work, one node is insufficient.

@voutcn voutcn closed this as completed Jul 25, 2019
@jolespin
Copy link

I'm trying to figure out how much memory to request for each of my jobs. My largest forward fastq has:

Number of reads 164687282
Summing to a total of 23352969181 bases

How do I know what M is calculated as before hand?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants