Variant annotation and prioritization pipeline

Overview

Variant annotation in cancer genomics involves identifying and characterizing the genetic changes (variants) that contribute to cancer development and progression. The challenge is that there are many different types of variants that can occur in the genome, and not all of them are relevant to cancer. Therefore, accurate annotation is critical for identifying the key driver mutations and designing targeted therapies. However, this process is complicated by the large number of potential variants, the need to integrate data from multiple sources, and the ongoing discovery of new cancer-associated variants.

We have developed a Nextflow pipeline called variantalker that enables users to annotate variants from VCF files. Our pipeline supports VCF files generated from dragen, nf-sarek, and ION-torrent platforms.

BETA version: we have implemented the possibility to extract biomarkers such as TMB, mutational signatures (apobec, uv and tabacco), clonal TMB (if bam/cram files and sex are provided), expression of specific genes (if RNA-seq data are provided), gene cnv, etc. For more information, look at here

Installation

Clone the repo

git clone https://github.com/zhanyinx/variantalker.git

variantalker relies on Annovar software and Funcotator databases.

Download the updated databases. Separate repositories for hg19 and hg38 are available.

wget -r -N --no-parent -nH --cut-dirs=3 -P public_databases/hg38 https://bioserver.ieo.it/repo/dima/hg38 
wget -r -N --no-parent -nH --cut-dirs=3 -P public_databases/hg19 https://bioserver.ieo.it/repo/dima/hg19

Documentation

The pipeline employs several tools to annotate and prioritize variants:

Funcotator for variant annotation
CancerVar for somatic variants prioritization
InterVar for germline variants annotation
Annovar: cancervar and intervar reply on Annovar.
CIViC: somatic variant classification using CIViC evidence level.
AlphaMissense: somatic and germline variant prioritization.

To ensure the accuracy of the pipeline, the databases for Funcotator and Annovar must be regularly updated using the provided tools found here: update utilities.

Usage

If you are using for the first time, please consider updating the databases following the instructions.

Modify the configuration file (nextflow.config) by setting the following parameters:

funcotator_germline_db: e.g. path2/public_databases/funcotator_dataSources.v1.7.20200521g
funcotator_somatic_db: e.g. path2/public_databases/funcotator_dataSources.v1.7.20200521s
annovar_db: e.g. path2/public_databases/humandb
annovar_software_folder: e.g. path2/annovar
alpha_mis_genome_basedir: e.g. path2/public_databases
fasta: path to fasta file used to generate the vcf
target: path to the target bed file

The main command line for the annotation is the following

nextflow run path_to/main.nf -c yourconfig -profile singularity --input samplesheet.csv --outdir outdir

nextflow run path_to/main.nf --help --show_hidden_params

Input

variantalker takes as input a csv samplesheet with 4 columns

IMPORTANT: HEADER is required

patient	tumor_tissue	sample_file	sample_type
patient1	Lung	path/tumor.vcf.gz	somatic
.....	.....	.....	.....

Sample_file must be provided with full path, not relative path

Available sample_type are: somatic, germline, cnv.

somatic sample type: it can be tumoronly (single sample) or tumor_normal (multi sample) vcf.gz file. Requires tumor_tissue to be specified
germline: single sample vcf.gz file. It does not require tumor_tissue
cnv: for nfcore/sarek, CNVKit output is supported (cnr file). For dragen, vcf.gz file required. It does not require tumor_tissue

Available tumor_tissue are: Adrenal_Gland Bile_Duct Bladder Blood Bone Bone_Marrow Brain Breast Cancer_all Cervix Colorectal Esophagus Eye Head_and_Neck Inflammatory Intrahepatic Kidney Liver Lung Lymph_Nodes Nervous_System Other Ovary Pancreas Pleura Prostate Skin Soft_Tissue Stomach Testis Thymus Thyroid Uterus

Output

Output structure:

params.outdir
|-- date
|   `-- annotation
|       |-- germline
|       |   `-- patient
|       |       |-- filtered.patient.maf.pass.tsv
|       |       |-- filtered.patient.maf.nopass.tsv
|       |       |-- patient.vcf
|       |       `-- patient.maf
|       `-- somatic
|           `-- patient
|       |       |-- filtered.patient.maf.pass.tsv
|       |       |-- filtered.patient.maf.nopass.tsv
|       |       |-- patient.vcf
|               `-- patient.maf
|       `-- cnv
|           `-- patient
|       |       |-- patient.cnv.annotated.tsv

variantalker outputs for each sample multiple files

maf file with all the annotations
vcf file with the PASS variants
filtered pass file with variants passing the filters (see below).
filtered nopass file with variants not passing the filters (see below).
cnv annotated file (if cnv samples provided)

Default filters applied:

"Silent", "IGR", "RNA" variant types are filtered out (unless it's pathogenic or likely pathogenic for clinvar/cancervar/intervar)
minimum coverage 50 (unless it's pathogenic or likely pathogenic for clinvar/cancervar/intervar)
minimum somatic VAF: 0.01
minimum germline VAF: 0.2
InterVar classes to be kept: Pathogenic,Likely pathogenic (logic OR)
CancerVar classes to be kept: Tier_II_potential,Tier_I_strong (logic OR)
ReNOVo class to be kept: LP Pathogenic,IP Pathogenic,HP Pathogenic (logic OR)
CIViC evidence levels to be kept: A,B,C (logic OR)
no filters on genes (somatic or germline)

Logic OR filters: a variant is kept if at least one of the OR filters is true

Liability

Variantalker assumes no responsibility for any injury to person or damage to persons or property arising out of, or related to any use of Variantalker, or for any errors or omissions. The user recognizes they are using Liability at their own risk.

Name		Name	Last commit message	Last commit date
Latest commit History 183 Commits
bin		bin
config		config
docs/biomarkers		docs/biomarkers
lib		lib
modules		modules
resources		resources
streamlit_app		streamlit_app
update_db		update_db
utils		utils
workflows		workflows
.editorconfig		.editorconfig
.gitignore		.gitignore
CITATIONS.md		CITATIONS.md
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Variant annotation and prioritization pipeline

Contents

Overview

Installation

Documentation

Usage

Input

Output

Liability

About

Releases 5

Packages

Languages

License

zhanyinx/variantalker

Folders and files

Latest commit

History

Repository files navigation

Variant annotation and prioritization pipeline

Contents

Overview

Installation

Documentation

Usage

Input

Output

Liability

About

Resources

License

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Packages