Skip to content
/ hocort Public
forked from ignasrum/hocort

HoCoRT - Remove specific organisms from sequencing reads

License

Notifications You must be signed in to change notification settings

uio-bmi/hocort

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HoCoRT

install with bioconda anaconda download page
Host Contamination Removal Tool (HoCoRT)

Removes specific organisms from sequencing reads on Linux and Mac OS.

Supports un-/paired FastQ input. Outputs in FastQ format.

Dependencies

Python 3.7+
External programs:

  • Bowtie2 (Tested with version 2.4.5)
  • HISAT2 (Tested with version 2.2.1)
  • Kraken2 (Tested with version 2.1.2)
  • bwa-mem2 (Tested with version 2.2.1)
  • BBMap (Tested with version 38.96)
  • Minimap2 (Tested with version 2.24)
  • samtools (Tested with version 1.15)

Installing with Bioconda

Install with Bioconda:

conda install -c bioconda hocort

Manual Installation

First ensure that there is no conda environment called "hocort".
Now download the necessary files:

wget https://raw.githubusercontent.com/ignasrum/hocort/main/install.sh && wget https://raw.githubusercontent.com/ignasrum/hocort/main/environment.yml

After downloading the files, run the installation bash script to install HoCoRT:

bash ./install.sh

The installation is done. Activate the Conda environment:

conda activate hocort

Using HoCoRT

Pipeline naming

Pipelines are named after the tools they utilize. For example, the pipeline bowtie2 uses Bowtie2 to map the reads, and kraken2bowtie2 first classifies using Kraken2, then maps using Bowtie2.

Building indexes

Indexes are required to map sequences, and may be built either manually or with "hocort index" which simplifies the process. A Bowtie2 index may built using "hocort index" with the following command:

hocort index bowtie2 --input genome.fasta --output dir/basename

If one wishes to remove multiple organisms from sequencing reads, the input fasta should contain multiple genomes.

cat genome1.fasta genome2.fasta > combined.fasta

Paired end run

To map reads and output mapped/unmapped reads use the following command:

hocort map bowtie2 -x dir/basename -i input1.fastq input2.fastq -o out1.fastq out2.fastq

Single end run

Exactly as above, but with one input file and one output file.

hocort map bowtie2 -x dir/basename -i input1.fastq -o out1.fastq

Compressed input/output

Most pipelines support .gz compressed input and output. No extra configuration is required aside from having ".gz" extension in the filename.

Removing host contamination

The filter "--filter true/false" argument may be used to switch between outputting mapped/unmapped sequences. For example, if the reads are contaminated with human sequences and the index was built with the human genome, use "--filter true" to output unmapped sequences (everything except the human reads).

Extracting specific sequences

The filter "--filter true/false" argument may also be used to extract specific sequences. First, the index should be built with the genomes of the organisms to extract. Second, the sequencing reads should be mapped with the "--filter false" argument to output only the mapped sequences (sequences which map to the index containing genomes of the specific organisms).

Advanced usage

Importing and using HoCoRT in Python

HoCoRT can be imported in Python scripts and programs with "import hocort". This allows precise configuration of the tools being run.

import hocort.pipelines.bowtie2 as bowtie2

idx = "dir/basename"
seq1 = "in1.fastq"
seq2 = "in2.fastq"
out1 = "out1.fastq"
out2 = "out2.fastq"
options = ["--local", "--very-fast-local"] # options is passed to the aligner/mapper, this allows precise configuration

returncode = bowtie2.run(idx, seq1, out1, seq2=seq2, out2=out2, options=options)

Passing arguments to the underlying tools

It is possible to pass arguments to the underlying tools by specifying them in the -c/--config argument like this:

hocort map bowtie2 -c="--local --very-fast-local --score-min G,21,9"

Wiki

Wiki Homepage

Technical documentation

https://ignasrum.github.io/hocort/

About

HoCoRT - Remove specific organisms from sequencing reads

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.8%
  • Shell 0.2%