# download workflow
git clone https://github.com/thackl/detectEVE
cd detectEVE
# install dependencies via conda or mamba (https://github.com/conda-forge/miniforge)
mamba create -n detectEVE
mamba activate detectEVE
mamba env update --file workflow/envs/env.yaml
# run detectEVE
./detectEVE -h # show help
./detectEVE [options] [<in.fa> ...] # analyze local fasta files
./detectEVE [options] -a acc.csv # download & analyze NCBI accession table
./detectEVE [options] -A acc,acc # download & analyze NCBI accession list
# or combine local fasta files and remote accessions
./detectEVE [options] [(-a acc.csv | -A acc,acc)] [<in.fa> ...]
# download and prep databases
./detectEVE --setup-databases [--snake ARGS]
# run example data
cd examples
../detectEVE *.fna
See NCBI SRA WGS for downloadable accessions. Exported csv tables from the
Sequence Set Browser can be directly used with detectEVE (-a wgs_selector.csv
).
Genomes will be downloaded to genomes/<accession>.fna
.
Note, by default, downloaded genomes will be removed again automatically after
being scanned. If you want to keep them, add --notemp
to --snake
-args.
See Advanced database setup for alternatives and customization of databases.
See Known issues and https://github.com/thackl/detectEVE/issues for questions, problems or feedback.
The pipeline produces the following final files in results/
:
<genome_id>-validatEVEs.tsv
- best hit of the EVE with evidence and confidence annotation (high confidence: EVE score > 30, low confidence: EVE score > 10)<genome_id>-validatEVEs.fna
- validatEVEs nucleotide sequences<genome_id>-validatEVEs.pdf
- graphical overview of hit distribution for validatEVEs
detectEVE is based on the EVE search strategy developed by S. Lequime and previously used in the following publications:
- Lequime et al., 2017 https://doi.org/10.1093/ve/vew035
- Li et al., 2022 https://doi.org/10.1093/molbev/msac190
- Brait et al., 2023 https://doi.org/10.1093/ve/vead088
The current workflow involves the following steps:
If you need databases in a different location you can adjust db_dir
in
config.yaml
to whatever suits your system.
If you prefer to handle downloads manually or use existing files, copy any
file you don’t want detectEVE to download automatically into databases/
(or
the respective config.yaml/db_dir
) before running --setup-databases
.
Note though, unless you add --notemp
to the --snake
-arguments, all but the
final diamond-formatted database files will be deleted from databases/
at the
end of the setup phase.
cd databases/
# Latest RVDB
url=https://rvdb-prot.pasteur.fr/ &&
db=$(curl -fs $url | grep -oPm1 'files/U-RVDBv[0-9.]+-prot.fasta.xz')
curl $url/$db -o rvdb100.faa.xz
# UniRef50
wget https://ftp.uniprot.org/pub/databases/uniprot/uniref/uniref50/uniref50.fasta.gz
# NCBI taxonomy
wget https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/prot.accession2taxid.FULL.gz
wget https://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
tar -xzf taxdump.tar.gz nodes.dmp names.dmp
If you encounter an error related to tidyverse/stringi/libicui18n.so.58, try
reinstalling stringi
locally. To restart the workflow from where it failed,
just run the same command again.
mamba remove r-stringi r-tidyverse
R -e 'install.packages("stringi")'
mamba install r-tidyverse
diamond v2.1.9 has a bug and does not output send
correctly when using
long-read mode (–range-culling). Since at this point v2.1.9 is the latest
diamond version, detectEVE defaults to diamond v2.1.8 to avoid this bug.