GitHub - thackl/detectEVE: Find endogenous viral elements in genomes

Usage

# download workflow
git clone https://github.com/thackl/detectEVE
cd detectEVE

# install dependencies via conda or mamba (https://github.com/conda-forge/miniforge)
mamba create -n detectEVE
mamba activate detectEVE
mamba env update --file workflow/envs/env.yaml

# run detectEVE
./detectEVE -h                       # show help
./detectEVE [options] [<in.fa> ...]  # analyze local fasta files
./detectEVE [options] -a acc.csv     # download & analyze NCBI accession table
./detectEVE [options] -A acc,acc     # download & analyze NCBI accession list

# or combine local fasta files and remote accessions
./detectEVE [options] [(-a acc.csv | -A acc,acc)] [<in.fa> ...]

# download and prep databases
./detectEVE --setup-databases [--snake ARGS]

# run example data
cd examples
../detectEVE *.fna

See NCBI SRA WGS for downloadable accessions. Exported csv tables from the Sequence Set Browser can be directly used with detectEVE (-a wgs_selector.csv). Genomes will be downloaded to genomes/<accession>.fna.

Note, by default, downloaded genomes will be removed again automatically after being scanned. If you want to keep them, add --notemp to --snake-args.

See Advanced database setup for alternatives and customization of databases.

See Known issues and https://github.com/thackl/detectEVE/issues for questions, problems or feedback.

Output

The pipeline produces the following final files in results/:

<genome_id>-validatEVEs.tsv - best hit of the EVE with evidence and confidence annotation (high confidence: EVE score > 30, low confidence: EVE score > 10)
<genome_id>-validatEVEs.fna - validatEVEs nucleotide sequences
<genome_id>-validatEVEs.pdf - graphical overview of hit distribution for validatEVEs

Workflow overview and background

detectEVE is based on the EVE search strategy developed by S. Lequime and previously used in the following publications:

Lequime et al., 2017 https://doi.org/10.1093/ve/vew035
Li et al., 2022 https://doi.org/10.1093/molbev/msac190
Brait et al., 2023 https://doi.org/10.1093/ve/vead088

The current workflow involves the following steps:

Advanced database setup

If you need databases in a different location you can adjust db_dir in config.yaml to whatever suits your system.

If you prefer to handle downloads manually or use existing files, copy any file you don’t want detectEVE to download automatically into databases/ (or the respective config.yaml/db_dir) before running --setup-databases.

Note though, unless you add --notemp to the --snake-arguments, all but the final diamond-formatted database files will be deleted from databases/ at the end of the setup phase.

cd databases/

# Latest RVDB
url=https://rvdb-prot.pasteur.fr/ && 
db=$(curl -fs $url | grep -oPm1 'files/U-RVDBv[0-9.]+-prot.fasta.xz')
curl $url/$db -o rvdb100.faa.xz

# UniRef50
wget https://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref50/uniref50.fasta.gz

# NCBI taxonomy
wget https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.FULL.gz
wget https://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
tar -xzf taxdump.tar.gz nodes.dmp names.dmp

Known issues

tidyverse stringi libicui

If you encounter an error related to tidyverse/stringi/libicui18n.so.58, try reinstalling stringi locally. To restart the workflow from where it failed, just run the same command again.

mamba remove r-stringi r-tidyverse
R -e 'install.packages("stringi")'
mamba install r-tidyverse

diamond v2.1.9 `send` bug

diamond v2.1.9 has a bug and does not output send correctly when using long-read mode (–range-culling). Since at this point v2.1.9 is the latest diamond version, detectEVE defaults to diamond v2.1.8 to avoid this bug.

Name		Name	Last commit message	Last commit date
Latest commit History 96 Commits
databases		databases
examples		examples
figures		figures
workflow		workflow
.Rhistory		.Rhistory
.gitignore		.gitignore
LICENSE		LICENSE
README.org		README.org
config.yaml		config.yaml
detectEVE		detectEVE

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

databases

databases

examples

examples

figures

figures

workflow

workflow

.Rhistory

.Rhistory

.gitignore

.gitignore

LICENSE

LICENSE

README.org

README.org

config.yaml

config.yaml

detectEVE

detectEVE

Repository files navigation

Usage

Output

Workflow overview and background

Advanced database setup

Known issues

tidyverse stringi libicui

diamond v2.1.9 `send` bug

About

Releases

Packages

Contributors 3

Languages

License

thackl/detectEVE

Folders and files

Latest commit

History

Repository files navigation

Usage

Output

Workflow overview and background

Advanced database setup

Known issues

tidyverse stringi libicui

diamond v2.1.9 send bug

About

Resources

License

Stars

Watchers

Forks

Languages

diamond v2.1.9 `send` bug