Skip to content

yuhanH/HPViewer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HPViewer: sensitive and specific genotyping of human papillomavirus in metagenomic DNA

Description

HPViewer is a tool for genotyping and quantification of HPV from metagenomic or human genomic shotgun sequencing data. We designed it to improve performance by masking nonspecific sequences from reference genomes and directly identifying HPV short DNA reads. It contains two HPV databases with different masking strategies, repeat-mask and homology-mask and one homology distance matrix to choose between those two databases.

If you use the HPViewer software, please cite our manuscript:

Yuhan Hao, Liying Yang, Antonio Galvao Neto, Milan R Amin, Dervla Kelly, Stuart M Brown, Ryan C Branski, Zhiheng Pei; HPViewer: sensitive and specific genotyping of human papillomavirus in metagenomic DNA, Bioinformatics, bty037, https://doi.org/10.1093/bioinformatics/bty037

Installation

$ git clone https://github.com/yuhanH/HPViewer.git

Pre-requisites

Python (2.7+)

Python packages (sys, getopt, subprocess)

Bowtie2: http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml

SAMtools: http://www.htslib.org/

Bedtools: http://bedtools.readthedocs.io/en/latest/

Parameters

Required

a) input files (-U or -1 -2): fastq files (or fastq.gz), unpaired (-U unpaired.fastq) or R1,R2 paired (-1 R1.fastq -2 R2.fastq)

b) output file name (-o)

Optional

a) database mask type (-m): hybrid-mask(default), repeat-mask, homology-mask.

If you set -m, it should be in front of reads input (-m repeat-mask -1 R1.fastq -2 R2.fastq). Repeat-mask is a more sensitive mode; and homology-mask is suggested when some types of HPV are present in large abundance which may lead to false positive of other types of HPV.

b) number of threaded used in bowtie2 alignment (-p)

c) minimal coverage threshold to determine HPV present (-c), default is 150 bp (1.5 x average length of your reads).

Results

a) output_HPV_summary.txt has three coloumns with types of HPV present, number of reads per kilobase (RPK) for the matching HPV, and number of reads of the matching HPV.

b) alignment results after bowtie2: output.sam, output.bam

Basic Usage (demo)

python HPViewer.py -U test_unpaired.fastq -o TEST
more TEST/TEST_HPV_profile.txt

Work Flow

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages