ISV (Intepretation of Structural Variants)

Structural variants are regional genome mutations that have the potential to affect the phenotype of an individual. The prediction of their pathogenicity is an intricate problem.

Here we show that the pathogenicity of Copy Number Variants (CNVs) is encoded in features describing the CNV as a whole (such as number of gene elements, regulatory regions or known overlapped CNVs).

With carefully selected model hyperparameters, one can achieve more than 99% accuracy for both copy number loss and copy number gain variants.

Annotation and Prediction of novel CNVs

The annotation, prediction of pathogenicity as well as calculation of shap values for novel cnvs can be easily done using the isv python package: https://pypi.org/project/isv/

or as a command line tool. Follow instructions at https://github.com/tsladecek/isv_package for more info

Project directory Structure

The data/ folder contains raw data that was used for training the models and for the evaluation
- in the root of data/ folder one can find annotated cnvs from clinvar, separated into several datasets (train, validation, test, test-long, test-multiple) for each cnv type (loss and gain)
- annotations by annotsv and classifycnv are inside of separate directories under the same name
- annotated gnomad dataset, microdeletions and microduplications, genome cnvs and five studied syndromes are under directory data/evaludation_data
The results/ directory contains trained models with their gridsearch results as well as other tables
The plots/ directory contains main and supplementary figures
The scripts/ directory contains scripts for generating the results and training of models
The rules/ directory contains snakemake definitions, which are collected by the Snakefile at the root of the repository

To reproduce entire workflow, except of the circos plot and stains plot, run:

Create conda environment

we recommend using conda for best chance of reproducibility. The versions of all of the packages are listed in the environment.yml file
The scripts will likely only work on a linux machine

conda env create --file environment.yml 

conda activate ISV

Train models and generate all plots and tables

snakemake --cores <number of cores>

To reproduce the circos plot and the stains comparison plot see scripts/plots/cnvs_circular.Rmd and scripts/plots/chromosome_stains_study.Rmd

These scripts are written in R language and require tidyverse and circlize packages to be installed.

If you mention or use the ISV tool, please cite our article

https://www.nature.com/articles/s41598-021-04505-z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ISV (Intepretation of Structural Variants)

Annotation and Prediction of novel CNVs

If you mention or use the ISV tool, please cite our article

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 144 Commits
data		data
plots		plots
results		results
rules		rules
scripts		scripts
.gitignore		.gitignore
License.txt		License.txt
README.md		README.md
Snakefile		Snakefile
config.yaml		config.yaml
environment.yml		environment.yml

License

tsladecek/isv_cnv

Folders and files

Latest commit

History

Repository files navigation

ISV (Intepretation of Structural Variants)

Annotation and Prediction of novel CNVs

If you mention or use the ISV tool, please cite our article

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages