Skip to content

Commit

Permalink
- continue mode also work at the first run. suggested by https://gith…
Browse files Browse the repository at this point in the history
…ub.com/voutcn/megahit/issues

- minor change in README
  • Loading branch information
voutcn committed Jan 27, 2015
1 parent 0d6e306 commit e248ce0
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 21 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,14 +36,14 @@ To use the GPU version, run `make use_gpu=1` to compile MEGAHIT, and run MEGAHIT

Memory Control
----------------
We recommend to set `-m` as large as possible. In general, 90% of the free memory is recommended. This parameter is used to control the maximum memory that can be used for the SdBG construction. It is required to prevent the program from using swap space.
We recommend to set `-m` as large as possible. In general, 90-95% of the free memory is recommended. For example if the node have 64G available memory, a proper setting could be `-m 60e9`. This parameter is used to control the maximum memory that can be used for the SdBG construction. It is required to prevent the program from using swap space.

Since v0.2.0, it is not necessary for the SdBG builder to use up all the memory specificed by `-m`. The option `--mem-flag` specifies the ways to utilize memory: `--mem-flag 0` to use minimum memory, `--mem-flag 1` moderate memory and `--mem-flag 2` all memory.
Since v0.2.0, it is not necessary for the SdBG builder to use up all the memory specificed by `-m`. The option `--mem-flag` specifies the ways to utilize memory: `--mem-flag 0` to use minimum memory, `--mem-flag 1` (default) moderate memory and `--mem-flag 2` all memory.

Input Files
--------------

MEGAHIT accepts one fasta or fastq file as input. The input file can be gzip'ed. Alternatively, you can use the option `--input-cmd` to input reads from multiple files. Following the `--input-cmd` should be a command that outputs all reads to `STDOUT` in fasta or fastq format. A mix of fasta and fastq is also supported. Pair-end information is not used by MEGAHIT currently. Therefore pair-end files can be input to MEGAHIT as multiple single-end files. Some examples are shown below.
MEGAHIT accepts one fasta or fastq file as input. The input file can be gzip'ed. Alternatively, you can use the option `--input-cmd` to input reads from multiple files. Following the `--input-cmd` should be a command that outputs all reads to `STDOUT` in fasta or fastq format. A mix of fasta and fastq is also supported from version 0.2.0. Pair-end information is not used by MEGAHIT currently. Therefore pair-end files can be input to MEGAHIT as multiple single-end files. Some examples are shown below.

###Correct Examples
* Input from one gzip'ed fastq file named *reads.fastq.gz*:
Expand All @@ -62,7 +62,7 @@ MEGAHIT accepts one fasta or fastq file as input. The input file can be gzip'ed.
```
--input-cmd "fastq-dump -Z --fasta xxx.sra"
```
* Mixed fastq and fasta:
* Mixed fastq and fasta (supported since v0.2.0):
```
--input-cmd "cat 1.fa 2.fq"
```
Expand Down
36 changes: 19 additions & 17 deletions megahit
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ Required Arguments:
-m/--memory <float> max memory in byte to be used.
This argument is used to optimize the graph building module,
to prevent the SdBG builder from allocating memory larger than this value.
This value is recommended to be 90% of machine's (free) memory.
This value is recommended to be 90-95% of the machine's (free) memory.
The SdBG builder is unnecessary using all memory. Please refer to --mem-flag.
-l/--max-read-len <int> maximum read length
Expand Down Expand Up @@ -80,16 +80,17 @@ Optional Arguments:
this value less than 2 will lead to much more larger memory usage.
More detail assembly options:
--no-mercy Do not add mercy (k+1)-mer for k=k_min, default: off
--no-low-local Do not progressively remove low local coverage contigs, default: off
--low-local-ratio <float> Ratio threshold to define low local coverage contigs. Default: 0.2
--max-tip-len <int> Tips with length less than this value will be removed.
--no-mercy do not add mercy (k+1)-mer for k=k_min, default: off
--no-low-local do not progressively remove low local coverage contigs, default: off
--low-local-ratio <float> ratio threshold to define low local coverage contigs. Default: 0.2
--max-tip-len <int> tips with length less than this value will be removed.
Default: 2*k for iteration of kmer_size=k
--no-bubble Do not remove bubbles, default: off
--no-bubble do not remove bubbles, default: off
Other Arguments:
--continue continue from the last available check point
-h/--help print the usage message
--continue continue a MEGAHIT run from its last available check point.
please set output directory correctly if you use this option.
-h/--help print the usage message
'''

class Usage(Exception):
Expand Down Expand Up @@ -217,7 +218,8 @@ def parse_opt(argv):
elif option == "--mem-flag":
opt.mem_flag = int(value)
elif option == "--continue":
need_continue = True
if opt.continue_mode == 0: # avoid check again again again...
need_continue = True
else:
print >> sys.stderr, "Invalid option %s", option
exit(1)
Expand All @@ -242,8 +244,8 @@ def check_opt():
print >> sys.stderr, "Both read_file and input_cmd are set. Please use only one of them."
exit(1)
if opt.k_max >= opt.max_read_len:
opt.k_max = int(max_read_len) / 2 * 2 - 1
print >> sys.stderr, "Constrain maximum read length, k_max is set to be " + str(k_max)
opt.k_max = int(opt.max_read_len) / 2 * 2 - 1
print >> sys.stderr, "Constrain maximum read length, k_max is set to be " + str(opt.k_max)
if opt.k_min < 9:
print >> sys.stderr, "k_min should be at least 9."
exit(1)
Expand Down Expand Up @@ -284,20 +286,22 @@ def write_opt(argv):

def prepare_continue():
global opt # out_dir is already set
print >> sys.stderr, "Continue mode activated. Ignore all options other than -o/--out-dir."

if not os.path.exists(opt.out_dir + "opts.txt"):
print >> sys.stderr, "Cannot find" + opt.out_dir + "opts.txt"
print >> sys.stderr, "Cannot find " + opt.out_dir + "opts.txt"
print >> sys.stderr, "Please check whether the output directory is correctly set by \"-o\""
exit(1)
print >> sys.stderr, "Now switching to normal mode."
return

print >> sys.stderr, "Continue mode activated. Ignore all options other than -o/--out-dir."

with open(opt.out_dir + "opts.txt", "r") as f:
line = f.readline()
argv = line.strip().split()
print >> sys.stderr, "Continue with options: " + line.strip()
opt = Options()
opt.continue_mode = 1 # avoid dead loop
parse_opt(argv)
opt.continue_mode = 1
f.close()

opt.last_cp = -1
Expand All @@ -310,8 +314,6 @@ def prepare_continue():
cpf.close()
print >> sys.stderr, "Continue from check point " + str(opt.last_cp)



def check_bin():
if not os.path.exists(opt.bin_dir + "megahit_assemble"):
print >> sys.stderr, megahit_version_str + '\n' + "Cannot find sub-program \"megahit_assemble\", please recompile."
Expand Down

0 comments on commit e248ce0

Please sign in to comment.