MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph
Cuda C++ Python C Makefile
Clone or download
Latest commit ef1bae6 Mar 2, 2018
Permalink
Failed to load latest commit information.
cub Add all files Sep 25, 2014
example add bz2 support (via mkfifo) Jun 16, 2015
lib_idba Try to fix #96 using loneknightpy/idba#16 Oct 17, 2016
tools fix sdbg reader TLE Dec 23, 2015
.gitignore update version Mar 2, 2018
.travis.yml update .travis.yml & merge to master Mar 10, 2015
ChangeLog.md update version Mar 2, 2018
LICENSE Initial commit Sep 25, 2014
Makefile update Makefile & README Mar 2, 2018
README.md update Makefile & README Mar 2, 2018
asm_core.cpp soft code max_k_allowed in megahit script Dec 28, 2015
assembler.cpp added bubble-level option Dec 2, 2016
assembly_algorithms.cpp modification for megahit_gt (to be optimized) Jan 19, 2016
assembly_algorithms.h modification for megahit_gt (to be optimized) Jan 19, 2016
atomic_bit_vector.h Update atomic_bit_vector.h Mar 2, 2018
bit_operation.h astyle -A14 -w -xW -Y -xG -M -f -p -k3 -y -xL *.h *.cpp tools/*.cpp Aug 4, 2015
build_read_lib.cpp astyle -A14 -w -xW -Y -xG -M -f -p -k3 -y -xL *.h *.cpp tools/*.cpp Aug 4, 2015
city.cpp astyle -A14 -w -xW -Y -xG -M -f -p -k3 -y -xL *.h *.cpp tools/*.cpp Aug 4, 2015
city.h astyle -A14 -w -xW -Y -xG -M -f -p -k3 -y -xL *.h *.cpp tools/*.cpp Aug 4, 2015
citycrc.h Add citycrc.h. Fixed #39 Jun 19, 2015
cx1.h fix bug in memory adjustment Aug 1, 2017
cx1_kmer_count.cpp fix another bug in memory adjustment Aug 1, 2017
cx1_kmer_count.h still buggy Feb 16, 2016
cx1_read2sdbg.h assist_eq memory issus fixed Jan 19, 2016
cx1_read2sdbg_s1.cpp fix another bug in memory adjustment Aug 1, 2017
cx1_read2sdbg_s2.cpp hotfix Aug 1, 2017
cx1_seq2sdbg.cpp fix another bug in memory adjustment Aug 1, 2017
cx1_seq2sdbg.h more careful bubble Nov 15, 2015
definitions.h update version Mar 2, 2018
edge_io.h fixed a memory illegal access Jun 7, 2016
functional.h formatted by astyle Mar 31, 2015
hash.h astyle -A14 -w -xW -Y -xG -M -f -p -k3 -y -xL *.h *.cpp tools/*.cpp Aug 4, 2015
hash_map.h formatted by astyle Jun 14, 2015
hash_table.h fixed a bug when #. of large multi > INT_MAX Aug 10, 2015
histgram.h add prune-level 3 Dec 28, 2015
iterate_edges.cpp more careful bubble Nov 15, 2015
khash.h astyle -A14 -w -xW -Y -xG -M -f -p -k3 -y -xL *.h *.cpp tools/*.cpp Aug 4, 2015
kmer.h astyle -A14 -w -xW -Y -xG -M -f -p -k3 -y -xL *.h *.cpp tools/*.cpp Aug 4, 2015
kmer_plus.h renew License statement Oct 21, 2015
kseq.h astyle -A14 -w -xW -Y -xG -M -f -p -k3 -y -xL *.h *.cpp tools/*.cpp Aug 4, 2015
kthread.cpp astyle -A14 -w -xW -Y -xG -M -f -p -k3 -y -xL *.h *.cpp tools/*.cpp Aug 4, 2015
lib_info.h renew License statement Oct 21, 2015
local_assemble.cpp renew License statement Oct 21, 2015
local_assembler.cpp renew License statement Oct 21, 2015
local_assembler.h renew License statement Oct 21, 2015
lv2_cpu_sort.h fix a bug Mar 3, 2016
lv2_gpu_functions.cu renew License statement Oct 21, 2015
lv2_gpu_functions.h renew License statement Oct 21, 2015
megahit disable —bubble-level 3. It is buggy. Jan 23, 2017
mem_file_checker-inl.h renew License statement Oct 21, 2015
options_description.cpp astyle -A14 -w -xW -Y -xG -M -f -p -k3 -y -xL *.h *.cpp tools/*.cpp Aug 4, 2015
options_description.h formatted by astyle Mar 31, 2015
packed_reads.h renew License statement Oct 21, 2015
pool.h astyle -A14 -w -xW -Y -xG -M -f -p -k3 -y -xL *.h *.cpp tools/*.cpp Aug 4, 2015
query_sdbg.cpp renew License statement Oct 21, 2015
rank_and_select.h renew License statement Oct 21, 2015
read_lib_functions-inl.h In response to #81 May 17, 2016
sdbg_builder.cpp number of threads >=2 only required for GPU version May 17, 2016
sdbg_multi_io.h typo Dec 23, 2015
sequence_manager.cpp In response to #81 May 17, 2016
sequence_manager.h In response to #81 May 17, 2016
sequence_package.h renew License statement Oct 21, 2015
succinct_dbg.cpp modification for megahit_gt (to be optimized) Jan 19, 2016
succinct_dbg.h fixed a memory illegal access Jun 7, 2016
unitig_graph.cpp length threshold *2 in unitig graph’s tip removal Dec 4, 2016
unitig_graph.h added iteratively remove bubbles and tips, and super bubbles Nov 27, 2016
utils.h add some debug information Apr 12, 2016

README.md

If you are using MEGAHIT v1.0.4-beta or v1.0.5, please be advised to update it to the latest version.

BioConda Install GitHub Downloads Build Status

Getting Started

git clone https://github.com/voutcn/megahit.git
cd megahit
make
./megahit -1 pe_1.fq.gz -2 pe_2.fq.gz -o megahit_out

Introduction

MEGAHIT is a single node assembler for large and complex metagenomics NGS reads, such as soil. It makes use of succinct de Bruijn graph (SdBG) to achieve low memory assembly. MEGAHIT can optionally utilize a CUDA-enabled GPU to accelerate its SdBG contstruction. The GPU-accelerated version of MEGAHIT has been tested on NVIDIA GTX680 (4G memory) and Tesla K40c (12G memory) with CUDA 5.5, 6.0 and 6.5. MEGAHIT v1.0 or greater also supports IBM Power PC and has been tested on IBM POWER8.

Dependency & Installation

MEGAHIT is suitable for 64-bit Linux and MAC OS X. It requires zlib, python 2.6 or greater and G++ 4.4 or greater (with -std=c++0x and OpenMP support). Notably, for MAC OS X, the g++ in the path is probably the sym-link of clang, which do not support OpenMP. Users should have the "real" G++ installed and use make CXX=/PATH/TO/G++ to specify the compiler.

Please modified the value of kMaxK in definitions.h if you want to increase the maximum k-mer size allowed.

The GPU counterpart further requires CUDA 5.5 or greater. Please use make use_gpu=1 to compile it, and turn on --use-gpu to activate GPU acceleration when running MEGAHIT.

Binary release can be found at the release page.

To install MEGAHIT to another directory, please copy megahit, megahit_asm_core, megahit_toolkit and megahit_sdbg_build (and megahit_sdbg_build_gpu for GPU counterpart) to the destination.

Running MEGAHIT

If MEGAHIT is successfully compiled, it can be run by the following command:

./megahit [options] {-1 <pe_1.fq> -2 <pe_2.fq> | --12 <pe12.fq> | -r <se.fq>}

-1/-2, --12 and -r are parameters for inputting paired-end, interleaved-paired-end and single-end files. They accept files in fasta (.fasta, .fa, .fna) or fastq (.fastq, .fq) formats. They also supports gzip files (with .gz extensions) and bzip2 files (with .bz2 extensions). Please run ./megahit -h for detailed usage message.

Assembly Tips

To fine tune parameters for specific datasets, please find our suggestions on this wiki page.

FAQ & Reporting issues

For other questions, please first refer to our wiki. Please report an issue in github when necessary.

Useful Links

Citing MEGAHIT

If you use MEGAHIT v0.x or want to cite MEGAHIT for general purpose (e.g. review), please cite:

  • Li, D., Liu, C-M., Luo, R., Sadakane, K., and Lam, T-W., (2015) MEGAHIT: An ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, doi: 10.1093/bioinformatics/btv033 [PMID: 25609793].

If you use MEGAHIT v1.0 or higher version, or assemblies in MEGABOX, please also cite:

  • Li, D., Luo, R., Liu, C.M., Leung, C.M., Ting, H.F., Sadakane, K., Yamashita, H. and Lam, T.W., 2016. MEGAHIT v1.0: A Fast and Scalable Metagenome Assembler driven by Advanced Methodologies and Community Practices. Methods.

License & Supports

    MEGAHIT
    Copyright (C) 2014-2015  The University of Hong Kong & L3 Bioinformatics Limited

    This program is free software: you can redistribute it and/or modify
    it under the terms of the GNU General Public License as published by
    the Free Software Foundation, either version 3 of the License, or
    (at your option) any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program.  If not, see <http://www.gnu.org/licenses/>.

MEGAHIT is released under GPLv3. For personalized customization and commercial supports, please contact L3 Bioinformatics Limited (rb at l3-bioinfo.com).