TSNAD

An integrated pipeline for neoantigen prediction from NGS data.

Authors: Zhan Zhou, Jingcheng Wu, Xingzheng Lyu, Jianan Ren
Date: July 2021
Version: 2.0.1
License: TSNAD is released under GNU license
System: Linux
Contact: zhanzhou@zju.edu.cn

Introduction

An integrated software for cancer somatic mutation and tumour-specific neoantigen detection.

Installation and usage

There are two ways to install TSNAD:

installed by docker without any other pre-installed tools (strongly recommand, can be used both in linux and windows)
installed by github with all required tools installed (only can be used in linux)

Installed by docker

First, you need to install docker (https://docs.docker.com/)

then, type the following code to install TSNAD:

docker pull biopharm/tsnad:latest

it may take several hours to download because of the large size.

Usage by docker

You need to enter the TSNAD running enviromont with your path of WES/WGS/RNA-seq as the following command (RNA-seq is not necessary to provide):

docker run -it -v [dir of WES/WGS]/:/home/tsnad/samples -v [dir of RNA-seq]:/home/tsnad/RNA-seq -v [output dir]:/home/tsnad/results biopharm/tsnad:latest /bin/bash

type the following command then the prediction of neoantigen from WES/WGS would start:

cd /home/tsnad

bash uncompress.sh

python TSNAD.py -I samples/ -R RNA-seq/ -V [grch37/grch38] -O results/

All results would be stored in the folder results/, and the final results of neoantigen are stored in the results/deephlapan_results/.

Installed by github

Requirements

TSNAD uses the following software and libraries:

Trimmomatic 0.39 (In Tools/)
BWA 0.7.17 (In Tools/)
SAMtools 1.13 (In Tools/)
GATK 4.2.0.0
VEP 104
hisat2 2.2.1
Stringtie 2.1.6 (In Tools/)
OptiType 1.3.5 (In Tools/)
STAR 2.7 (In Tools/)
Arriba 1.1.0 (In Tools/)
DeepHLApan 1.1 (In Tools/)
JAVA 1.8
Python 2.7
Perl 5.22

1-11 tools are better put in the folder Tools/.

Installation of each module

Trimmomatic
```
 unzip Trimmomatic-*.zip
```

BWA

 tar -xjvf bwa-*.tar.bz2
 cd bwa-*
 make
 
 vim ~/.bashrc
 export PATH=$PATH:/home/tsnad/Tools/bwa-0.7.17/
 source ~/.bashrc

SAMtools

 sudo apt-get install libncurses5-dev
 sudo apt-get install libbz2-dev
 sudo apt-get install liblzma-dev
 tar -xjvf samtools-*.tar.bz2
 cd samtools-*
 ./configure
 make
 sudo make install

GATK

 unzip gatk-*.zip
 sudo apt install openjdk-8-jdk-headless
 
 The necessary files for grch37
 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/1000G_phase1.snps.high_confidence.b37.vcf.gz
 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/1000G_phase1.snps.high_confidence.b37.vcf.idx.gz
 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/dbsnp_138.b37.vcf.gz
 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/dbsnp_138.b37.vcf.idx.gz
 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.gz
 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/Mills_and_1000G_gold_standard.indels.b37.vcf.idx.gz
 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/human_g1k_v37.fasta.gz  
 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/human_g1k_v37.fasta.fai.gz  
 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/human_g1k_v37.fasta.ann.gz
 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/human_g1k_v37.fasta.bwt.gz
 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/human_g1k_v37.fasta.amb.gz
 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/human_g1k_v37.fasta.pac.gz
 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/human_g1k_v37.fasta.sa.gz
 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/human_g1k_v37.2bit.gz
 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/b37/human_g1k_v37.dict.gz
 
 The necessary files for grch38
 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz
 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/1000G_phase1.snps.high_confidence.hg38.vcf.gz.tbi
 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/dbsnp_146.hg38.vcf.gz
 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/dbsnp_146.hg38.vcf.gz.tbi
 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz
 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi
 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/Homo_sapiens_assembly38.fasta.gz
 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/Homo_sapiens_assembly38.fasta.fai
 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/Homo_sapiens_assembly38.fasta.64.alt
 wget ftp://gsapubftp-anonymous@ftp.broadinstitute.org/bundle/hg38/Homo_sapiens_assembly38.dict

uncompress all the downloaded files and put them in the same folder (e.g. gatk-*/b37/)

to note, the chromosome name in dbsnp file is different from other files, so we need to transform it as follows :

 perl sub/transform.pl dbsnp_138.b37.vcf dbsnp_138.b37_adj.vcf

VEP

 unzip ensembl-vep-release-*.zip
 cd ensembl-vep-release-*
 perl INSTALL.pl

download the API, download the cache homo_sapiens_merged_vep_104_GRCh37.tar.gz for grch37, download the cache homo_sapiens_merged_vep_104_GRCh38.tar.gz for grch38.

if it is not help, try following step:

 cd 
 mkdir src
 cd src
 wget ftp://ftp.ensembl.org/pub/ensembl-api.tar.gz
 wget https://cpan.metacpan.org/authors/id/C/CJ/CJFIELDS/BioPerl-1.6.924.tar.gz
 tar -xvf ensembl-api.tar.gz
 tar -xvf BioPerl-1.6.924.tar.gz

 PERL5LIB=${PERL5LIB}:${HOME}/src/BioPerl-1.6.924
 PERL5LIB=${PERL5LIB}:${HOME}/src/ensembl/modules
 PERL5LIB=${PERL5LIB}:${HOME}/src/ensembl-compara/modules
 PERL5LIB=${PERL5LIB}:${HOME}/src/ensembl-variation/modules
 PERL5LIB=${PERL5LIB}:${HOME}/src/ensembl-funcgen/modules
 PERL5LIB=${PERL5LIB}:${HOME}/src/ensembl-io/modules
 PERL5LIB=${PERL5LIB}:${HOME}/src/ensembl-tools
 export PERL5LIB
 
 sudo perl -MCPAN -e shell
 install Bio::PrimarySeqI
 install DBI

Hisat2

 unzip hisat2-*.zip
 cd hisat2-*
 
 The necessary files for grch37
 wget https://genome-idx.s3.amazonaws.com/hisat/grch37_genome.tar.gz
 wget http://ftp.ensembl.org/pub/grch37/release-104/gtf/homo_sapiens/Homo_sapiens.GRCh37.87.gtf.gz
 tar -zxvf grch37_genome.tar.gz
 gunzip Homo_sapiens.GRCh37.87.gtf.gz -d
 
 The necessary files for grch38
 wget https://genome-idx.s3.amazonaws.com/hisat/grch38_genome.tar.gz
 wget http://ftp.ensembl.org/pub/release-104/gtf/homo_sapiens/Homo_sapiens.GRCh38.104.gtf.gz
 tar -zxvf grch38_genome.tar.gz
 gunzip Homo_sapiens.GRCh38.104.gtf.gz -d

Stringtie
```
 tar -zxvf stringtie-*.tar.gz
```

OptiType

 unzip OptiType.zip -d OptiType
 
 cd OptiType/glpk-5.0
 ./configure
 make && make install
 
 cd ../OptiType/hdf5-1.12.1
 ./configure
 make && make install
 
 vim /etc/ld.so.conf
 /usr/local/lib
 /sbin/ldconfig -v
 
 pip install numpy
 pip install pyomo
 pip install pysam
 pip install matplotlib
 pip install tables
 pip install pandas
 pip install future

STAR

 unzip STAR-master.zip
 cd STAR-master/source
 make STAR
 
 The necessary files for grch37
 wget  ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/gencode.v19.annotation.gtf.gz
 
 The necessary files for grch38
 wget  ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.annotation.gtf.gz

Arriba

tar -xvf arriba_v1.1.0.tar.gz
cd arriba_v1.1.0 && make

DeepHLApan

unzip deephlapan.zip -d deephlapan
cd deephlapan
python setup.py install

Usage by github

configure the file in the directory /config, take grch38 as example:

 trimmomatic_tool /home/tsnad/Tools/Trimmomatic-0.39/trimmomatic-0.39.jar
 bwa_folder /home/tsnad/Tools/bwa-0.7.17/
 samtools_folder /home/tsnad/Tools/samtools-1.13/
 gatk_tool /home/tsnad/Tools/gatk-4.2.0.0/gatk-package-4.2.0.0-local.jar
 VEP_folder /home/tsnad/Tools/ensembl-vep/
 hisat2_folder /home/tsnad/Tools/hisat2-2.1.0/
 stringtie_tool /home/tsnad/Tools/hisat2-2.1.0/stringtie-1.3.5.Linux_x86_64/stringtie
 Optitype_folder /home/tsnad/Tools/OptiType/
 star_folder	/home/tsnad/Tools/STAR/
 arriba_folder /home/tsnad/Tools/arriba_v1.1.0/
 ref_human_file /home/tsnad/Tools/gatk-4.2.0.0/grch38/Homo_sapiens_assembly38.fasta
 ref_1000G_file /home/tsnad/Tools/gatk-4.2.0.0/grch38/1000G_phase1.snps.high_confidence.hg38.vcf
 ref_Mills_file /home/tsnad/Tools/gatk-4.2.0.0/grch38/Mills_and_1000G_gold_standard.indels.hg38.vcf
 ref_dbsnp_file /home/tsnad/Tools/gatk-4.2.0.0/grch38/dbsnp_144.hg38_adj.vcf
 headcrop 10
 leading 3
 minlen 35
 needRevisedData True
 normal_f 0
 normal_reads 6
 slidingwindow 4:15
 threadNum 6
 trailing 3
 tumor_alt 5
 tumor_f 0.05
 tumor_reads 10
 typeNum 2
 laneNum 1
 partNum 2

replace the path of each tool or reference file in your own. The other parameters from headcrop to partNum should not be changed if you don't know their meanings.

After configuration, return to the path where TSNAD.py located:

 python TSNAD.py -I [dir of WES/WGS] -R [dir of RNA-seq] -V [grch37/grch38] -O [dir of outputs]

The meaning of parameters in config file

headcrop: Cut the specified number of bases from the start of the read, default 10, used by trimmomatic
leading: Cut bases off the start of a read, if below a threshold quality,default 3, used by trimmomatic
minlen: Drop the read if it is below a specified length, default 35, used by trimmomatic
slidingwindow: Perform a sliding window trimming, cutting once the average quality within the window falls below a threshold, default 4:15, used by trimmomatic
normal_f: The maximum fraction of single nucleotide variant in normal sample, default 0, used for somatic mutation filtering.
normal_reads: The minimum number of sequence reads in normal sample, default 6, used for somatic mutation filtering.
tumor_alt: The minimum number of single nucleotide variant in tumor sample, default 5, used for somatic mutation filtering. tumor_f: The minimum fraction of single nucleotide variant in tumor sample, default 0.05, used for somatic mutation filtering.
tumor_reads: The minimum number of sequence reads in tumor sample, default 10, used for somatic mutation filtering.
typeNum: The number of types of input files(i.e. tumor and normal:2, tumor only :1), default:2. In this tool, it's always 2.
laneNum: The number of lanes when sequencing, default:1.
partNum: Single-read sequencing:1, paired-end sequencing:2, default:2.

As the default parameters, the input WGS/WES files in the input directory should be

	normal_L1_R1.fastq.gz
	normal_L1_R2.fastq.gz
	tumor_L2_R1.fastq.gz
	tumor_L2_R2.fastq.gz

The samples could be downloaded from following links:

normal_L1_R1.fastq.gz
normal_L1_R2.fastq.gz
tumor_L2_R1.fastq.gz
tumor_L2_R2.fastq.gz
rna_L1_R1.fastq.gz
rna_L1_R2.fastq.gz

To generate useable neoantigen predictions, the minimum depth should be 15X for WGS and 50X for WES, the recommended depth should be 30X for WGS and 100X for WES. For sample with WES tumor/normal data and RNA-seq data, it takes about 50 hours to finish neoantigen prediction in the Ubuntu system with 64G memory and 512G hard disk space.

Update log

v2.0.1

2021.07

replace SOAP-HLA and Kourami with OptiType

the version of each tool is listed as follows:

 Trimmomatic 0.39
 BWA 0.7.17
 SAMtools 1.13   
 GATK 4.2.0.0
 VEP 104  
 Hisat2 2.2.1 
 Stringtie 2.1.6
 OptiType 1.3.5
 STAR 2.7
 Arriba 1.1.0
 DeepHLApan 1.1

v2.0

2019.09

provide the neoantigen prediction from indel and gene fusion
replace NetMHCpan with DeepHLApan
provide the docker version of TSNAD
provide the web-service of TSNAD (http://biopharm.zju.edu.cn/tsnad/)

v1.2

2019.05

VEP v94 -> v96
Add the selection of grch38 when calling mutations.

v1.1

2018.11

Trimmomatic v0.35 -> v0.38
BWA v0.7.12 -> v0.7.17
SAMtools v1.3 -> v1.9
Picard v1.140 -> embedded in GATK
GATK v3.5 -> v4.0.11.0
Annovar -> VEP v94
NetMHCpan v2.8 -> v4.0
Add the function of RNA-seq analysis for neoantigen filter.

v1.0

2017.04

GUI for neoantigen prediction
Two parts: one for somatic mutation detection, another for HLA-peptide binding prediction.

Name		Name	Last commit message	Last commit date
Latest commit History 191 Commits
Tools		Tools
config		config
sub		sub
README.md		README.md
TSNAD.py		TSNAD.py
parse_args.py		parse_args.py
subfunction.py		subfunction.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tools

Tools

config

config

sub

sub

README.md

README.md

TSNAD.py

TSNAD.py

parse_args.py

parse_args.py

subfunction.py

subfunction.py

Repository files navigation

TSNAD

Introduction

Installation and usage

Installed by docker

Usage by docker

Installed by github

Requirements

Installation of each module

Usage by github

The meaning of parameters in config file

Update log

v2.0.1

v2.0

v1.2

v1.1

v1.0

About

Releases

Packages

Contributors 2

Languages

zjupgx/tsnad

Folders and files

Latest commit

History

Repository files navigation

TSNAD

Introduction

Installation and usage

Installed by docker

Usage by docker

Installed by github

Requirements

Installation of each module

Usage by github

The meaning of parameters in config file

Update log

v2.0.1

v2.0

v1.2

v1.1

v1.0

About

Resources

Stars

Watchers

Forks

Languages