Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md
fastQR1.fastq		fastQR1.fastq
fastQR2.fastq		fastQR2.fastq
getSamData.py		getSamData.py
mixtools.R		mixtools.R
reseq.py		reseq.py
statProcess.py		statProcess.py

Repository files navigation

RESEQ

RESEQ is a python script that combines mapping tools and bootstrap to access reproducibility in next generation sequencing data. It uses paired end FASTQ-files as input.

Dependencies and Installation

Dependencies:

Unix based operating system
Python 2.7 (recommended)
Bowtie2
Samtools
R 3.3 (optional)

Taxonomy and reference Data.

Reference Sequences:

For virus detection you need the complete reference sequences which are available at ncbi:
https://ftp.ncbi.nlm.nih.gov/refseq/release/viral/
Simply download the sequences, concatenate them and if needed remove new lines from sequences.
Example Template:

>genome1
AATTGGCC
>genome2
GGTTAAAC

EDIT YOUR PATH IN reseq.py LINE 19

Taxonomy Data:

You can download the taxonomy dumb here: https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/
note that to speed up analysis you can alter the file to only include entries which you are mapping to

EDIT YOUR PATH IN reseq.py IN LINE: 22

Bowtie2:

Follow installation here:
http://bowtie-bio.sourceforge.net/bowtie2/index.shtml

Bowtie needs to create index files for mapping:

$: bowtie2-build yourReferenceData.fna indexOut

EDIT THIS PATH IN reseq.py IN LINE 16 WITH THE PREFIX

Samtools:

Follow installation here: http://www.htslib.org/

Using RESEQ

python reseq.py -i _input1.fastq_ -j _input2.fastq

reseq provides parameters to optimize your run. To view all parameters type:

python reseq.py -h

  -h, --help        show this help message and exit
  -i INPUTR1        fastQ input file R1
  -j INPUTR2        fastQ input file R2
  -n LOOPCOUNT      number of repetitions
  -o OUTPUT         output name
  -s SEQUENCECOUNT  number of sequences extracted
  -r REPLACE        "n": to draw with out replacement Default=draw with
                    replacement
  -d DELETETMP      "n": to keep all temporary files. Default value deletes
                    all tmp files
  -S DELETESAM      "n": to keep all sam / bam files. Default value deletes
                    all files
  -R USER           "y": use R to apply mixture model for False/True positive
                    likelihood predictions
  -t THREADS        number of threads

known issues

if data sets are too small reseq can sometimes draw the same reads from the fastQ files. To prevent this reseq is implemented with a seed for each bootstrap repitition. To remove this seed you can comment out line 105
FASTQ files sometimes have different header conventions to distinguish between paired end files. In this version reseq is hard coded for the following:

@xxxxxx 1:N:0:5

to change this you can edit how the headers should be distinguished at line 99. For example if your header strukture looks like this:

@xxxxxx/1
change line 86 and 99 from:

R2Dict[lines[0].strip("\n").split(" ")[0]]= x

to

R2Dict[lines[0].strip("\n").split("/")[0]]= x

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RESEQ

Dependencies and Installation

Dependencies:

Taxonomy and reference Data.

Reference Sequences:

Taxonomy Data:

Bowtie2:

Samtools:

Using RESEQ

known issues

About

Releases

Packages

Languages

babaksaremi/RESEQ

Folders and files

Latest commit

History

Repository files navigation

RESEQ

Dependencies and Installation

Dependencies:

Taxonomy and reference Data.

Reference Sequences:

Taxonomy Data:

Bowtie2:

Samtools:

Using RESEQ

known issues

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages