DVGfinder is a tool that integrates the most used DVGs search algorithms, ViReMa-a and DI-tector, in a unique workflow, making the analysis of a sample easier and more intuitive.
Also, DVGfinder_v3 implements a Gradient Boosting classifier to try to reduce the number of false positives introduced by the search algorithms and generates an HTML report with interactive tables and plots that facilitates a first exploration of the results.
- ViReMa-a (0.23):
Routh A, Johnson JE. Discovery of functional genomic motifs in viruses with ViReMa-a Virus Recombination Mapper-for analysis of next-generation sequencing data. Nucleic Acids Res. 2014 Jan;42(2):e11. doi: 10.1093/nar/gkt916. Epub 2013 Oct 16. PMID: 24137010; PMCID: PMC3902915.
- DI-tector_06.py:
Beauclair G, Mura M, Combredet C, Tangy F, Jouvenet N, Komarova AV. DI-tector: defective interfering viral genomes' detector for next-generation sequencing data. RNA. 2018 Oct;24(10):1285-1296. doi: 10.1261/rna.066910.118. Epub 2018 Jul 16. PMID: 30012569; PMCID: PMC6140465.
To get a local copy up and running follow these simple steps.
DVGfinder uses the ViReMa-a (v0.23) and DI-tector_06.py programs.
This third party scripts are in the ExternalNeeds directory so you only have to follow the nexts steps to run DVGfinder.
-
Clone the repo in the directory of your choice
git clone https://github.com/MJmaolu/DVGfinder.git
-
Go to the DVGfinder directory
cd DVGfinder
-
Give execution permission to all the scripts in the DVGfinder directory
chmod -R +x .
-
Create a new environment with conda with all the dependencies needed to run DVGfinder
conda env create -f dvgfinder_env.yaml
-
Activate DVGfinder environment
conda activate dvgfinder_env
python3 DVGfinder_v3.1.py -fq path_to_fastq_file [-r path_to_fasta_virus_reference] [-m margin] [-t threshold] [-n number_processes] [-s polarity]
-r
The fasta of the viral reference and its indexed files by bwa
and bowtie
should be all in the path ExternalNeeds/references.
You can explore an example of results in the directory 'tumvas72_N100K_l100'.
To test the program follow the next steps:
-
Activate the environment
-
Run DVGfinder on the example sample
python3 DVGfinder_v3.1.py -fq tumvas72_N100K_l100/tumvas72_N100K_l100.fq -r ExternalNeeds/references/TuMV-AS.fasta -n 4
- Wait and your results will appear in the 'FinalReports' directory. In addition, an html report will open in your default browser.
Link to an example HTML report
-
The results are displayed at three levels (tabs at the top): ALL, CONSENSUS and FILTERED ML.
-
CONSENSUS and FILTERED ML only appear if both search algorithms have identified DVGs in the sample.
-
The same information displayed in the interactive tables is written in csv files.
The dataset used to generate the classificator is avalaible as '630N5Ml100_v2_metrics_labeledDataset.csv'.
For this version of DVGfinder we have used a Gradient Boosting Classifier algorithm to generate the model but I encorage you to play directly with the data and try to improve it.
- Olmo-Uceda MJ, Muñoz-Sánchez JC, Lasso-Giraldo W, Arnau V, Díaz-Villanueva W, Elena SF. DVGfinder: A Metasearch Tool for Identifying Defective Viral Genomes in RNA-Seq Data. Viruses. 2022 May 23;14(5):1114. doi: 10.3390/v14051114. PMID: 35632855.
María José Olmo-Uceda - mariajose.olmo@csic.es
PhD student
EvolSysVir Group, I2SysBio (CSIC-UV)
Project Link: https://github.com/MJmaolu/DVGfinder
Page Link: https://mjmaolu.github.io/DVGfinder/
Repository with Synthetic Data: https://github.com/MJmaolu/SyntheticSamplesWithDVGs
Repository with DVGfinder results on Real Data: https://github.com/MJmaolu/results-DVGfinder
Under Construction
Any suggestions will be welcome 🤗