metaOthello

Installation

Download code from https://github.com/xa6xa6/metaOthello/

There are two code folders:

Builder -- The code under "build" is used to generate the MetaOthello index.
Classifier -- The code under "classifier" is used to perform taxonomic classification of sequencing reads using MetaOthello Index.

Builder usage

Ready-built indexes indexes

Building indexing is very time-consuming (costs about 6 hours to build index for NCBI/refseq bacterial genome database). Therefore, we provide ready-built indexes (NCBI/refseq bacterial genome database) for users to download:

20mer index: https://drive.google.com/open?id=0BxgO-FKbbXRIYWREa2NwejlVYUU
25mer index: https://drive.google.com/open?id=0BxgO-FKbbXRIY1pRaHJsYVg5dTQ
31mer index: https://drive.google.com/open?id=0BxgO-FKbbXRIa0Flc3Q4bWtycGM

Building your own index

If you want to build an index with your own reference sequences, follow these steps.

Jellyfish is used to prepare k-mer files for each reference sequence. Download jellyfish from: http://www.cbcb.umd.edu/software/jellyfish/

Produce a k-mer count file for your reference seqeuences. Command:

    jellyfish count \
    –o <path_to_bacterial_rawKmerCountFile> \
    -m <Kmer_length> \
    -t <threads_num> \
    -s <bf_size> \
    -C <path_to_bacterial_referenceSeqFastaFile>

Dump k-mers to human-readable format. Command:

    jellyfish dump \
    –t –c \
    –o <path_to_bacterial_readableKmerCountFile> \
    <path_to_bacterial_rawKmerCountFile>

Generate taxonomy info file. Put all readable k-mer count files into the same directory <path_to_bacterial_reference_seq_Kmer_file_dir> and rename them as 1.Kmer, 2.Kmer, ..., m.Kmer, and generate a taxonomy info file like: https://drive.google.com/open?id=0BxgO-FKbbXRIZlV3ZzBBdlFpMTQ

There are three columns for each taxonomic rank in the file: the 1st column is a reissued id from 0 to m-1, where m is the total taxon num in that taxonomic rank. The 2nd column lists taxon ids, and the 3rd column lists taxon scientific names. Each row represents a species and its associated taxonomy info.
Run make build under the directory build.
Build the index. Command:

    ./build \
    <bacterial_reference_seq_associated_taxonomy_info_file(generated in Step1.3)> \
    <path_to_bacterial_reference_seq_Kmer_file_dir> \
    <shared kmer file suffixes> \
    <Kmer_length> 6 \
    <path_to_bacterial_index> \
    <path_to_a_temp_dir_for_intermediate_files>

Classifier usage

Run make classifier in the classifier directory.
Perform taxonomic classification for each metagenomics sequencing reads. Command:

    ./classifier \
    <path_to_bacterial_index> \
    <path_to_output_results_dir> \
    <Kmer_length> \
    <threads_num> \
    <fa_or_fq> \
    <SE_or_PE> \
    <bacterial_speciesId2taxoInfo_file> \ 
    <NCBI_names_file> \
    <readFile_singleEnd or readFile_end1> \
    (<readFile_end2 if paired-end reads are provided>)

<bacterial_speciesId2taxoInfo_file> can be downloaded from: https://drive.google.com/open?id=0BxgO-FKbbXRIc3FkLVFvMlpVVGM

Each row represents a species and its associated taxon ids at each taxonomic rank: species, genus, family, order, class, and phylum. Assign -1 if the taxon id is not available.

<NCBI_names_file> can be downloaded from: https://drive.google.com/open?id=0BxgO-FKbbXRIUFI2dHlBMXZhdTA

NOTE: We will keep the following files updated with the latest NCBI/refseq bacterial genome databases:

bacterial reference seq associated taxonomy info file,
bacerial index (MetaOthello index for classification)
bacterial speciesId2taxoInfo_file
NCBI names file

Also, we will release tools for generating all the above files (from NCBI/refseq bacterial genome databases) very soon.

License

Copyright (C) 2016-, University of Kentucky
Please refer to LICENSE.TXT for the detailed 'License'.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
build		build
classifier		classifier
.gitattributes		.gitattributes
.gitignore		.gitignore
ExampleUsecase.pdf		ExampleUsecase.pdf
LICENSE.TXT		LICENSE.TXT
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

metaOthello

Installation

Builder usage

Ready-built indexes indexes

Building your own index

Classifier usage

License

About

Releases

Packages

Contributors 3

Languages

License

xa6xa6/metaOthello

Folders and files

Latest commit

History

Repository files navigation

metaOthello

Installation

Builder usage

Ready-built indexes indexes

Building your own index

Classifier usage

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages