Code used for the zoonotic potential part of Poisot et al. (2021) "Imputing the mammalian virome with linear filtering and singular value decomposition". See viralemergence/trefle for the main code of this paper.
This repository is derived from zoonotic_rank, described in Mollentze et al. (2020).
- Install the conda package manager
- All other requirements can be installed in a self-contained environment using:
conda env create -f environment.yml
The commands below will repeat all analyses described in the manuscript. Note that making predictions for novel viruses is not currently possible with these models, but can be done with the original genome feature-based models (see zoonotic_rank). Running the full pipeline will take several days (approximately 3 days on a 16-core AMD 5950X processor).
conda activate haystack
make
Use make help
to see individual steps in the pipeline and further instructions.
(folders which will be created during model training are indicated by [*]
)
└─haystack/
├─Makefile ................................. Workflow specification
├─environment.yml .......................... Record of software/package versions
│ used
├─options.config ........................... Runtime options (specifies number
│ of parallel threads allowed and
│ the random seed)
├─InternalData/ ............................ Data unique to this project
│ └─svd_curated_accessions.csv ........... Accession numbers for each virus
│ species in the clover database
│ (where a full genome is available)
├─CalculatedData/ .......................... [*] Intermediate calculations
├─ExternalData/ ............................ [*] Data from external sources,
│ dowloaded as needed (see Makefile)
├─Misc/ .................................... Scripts to download external data
├─Plots/ ................................... [*] Final plots generated
├─RunData/ ................................. [*] Trained models
├─Scripts/ ................................. Main analysis and plotting scripts
└─Utils/ ................................... Utility functions and tools called
by other scripts, including CPB_Machine.jar
from https://github.com/rjorton/VirusFeatures