Percent Spliced-In (PSI) values are commonly used to report alternative pre-mRNA splicing (AS) changes. However, previous PSI-detection methods are limited to specific types of AS events. PSI-Sigma is using a new splicing index (PSIΣ) that is more flexible, can incoporate novel junctions, and can compute PSI values of individual exons in complex splicing events.

  • PSI-Sigma is now released in obfuscated codes for review purposes.


Kuan-Ting (Woody) Lin,

Alignment files

For short-read RNA-seq data, please generate .bam, .bai and .SJ.out files by using STAR (

###This is an example for short-read RNA-seq###
STAR --runThreadN 6 \
	--outSAMtype BAM SortedByCoordinate \
	--outFilterIntronMotifs RemoveNoncanonical \
	--genomeDir ~/index/starR100H38 \
	--twopassMode Basic \
	--readFilesIn R1.fastq R2.fastq \
	--outFileNamePrefix <NAME>.
samtools index <NAME>.Aligned.sortedByCoord.out.bam

For long-read RNA-seq data, please use GMAP (

###This is an example for long-read RNA-seq###
~/gmap-2017-11-15/bin/gmap -d GRCh38 -f samse --min-trimmed-coverage=0.5 --no-chimeras -B 5 -t 6 ~/MinION_long_read.fastq > <NAME>.sam
samtools view -bS <NAME>.sam > <NAME>.bam
samtools sort <NAME>.bam -o <NAME>.Aligned.sortedByCoord.out.bam
samtools index <NAME>.Aligned.sortedByCoord.out.bam

Quick Start

Create links to the .bam, .bai, and .SJ.out files in the a folder (afolder). If you are using long-read RNA-seq data, .SJ.out files will be generated automatically since GMAP doesn't produce the file.

mkdir afolder
cd afolder
ln -s bamfolder/*.bam* .
ln -s bamfolder/*.SJ.* .

Download a .gtf file and sort the coordinates.

gzip -d Homo_sapiens.GRCh38.87.gtf.gz
(grep "^#" Homo_sapiens.GRCh38.87.gtf; grep -v "^#" Homo_sapiens.GRCh38.87.gtf | sort -k1,1 -k4,4n) > Homo_sapiens.GRCh38.87.sorted.gtf
rm Homo_sapiens.GRCh38.87.gtf

Create two files: (1) groupa.txt and (2) groupb.txt. Please put suffixes of your files in the groupa.txt or groupb.txt. For example, the suffix of a "" file is "Sequins_MixA". Groupa.txt will be compared with groupb.txt. Below is an example of processing files from TCGA (11A means normal and 01A means tumor):

#For TCGA files:
ls *-11A-*.SJ* | sed s/ > groupa.txt
ls *-01A-*.SJ* | sed s/ > groupb.txt

#Alternatively, you can just put the names of your .bam files:
echo Sequins_MixA.Aligned.sortedByCoord.out.bam > groupa.txt
echo Sequins_MixB.Aligned.sortedByCoord.out.bam > groupb.txt

Run After the .gtf file, please specify 1 for short-read RNA-seq and 2 for long-read RNA-seq. The last column is used to specify the minimum number of supporting reads for an AS event (10 is specified in the example below).

#For short-read RNA-seq (minimum 10 supporting reads for an AS event)
perl ~/PSIsigma/ Homo_sapiens.GRCh38.87.sorted.gtf PSIsigma 1 10
#For long-read RNA-seq (minimum 10 supporting reads for an AS event)
perl ~/PSIsigma/ Homo_sapiens.GRCh38.87.sorted.gtf PSIsigma 2 10

That's it. The results will be in the PSIsigma_r10_ir3.sorted.txt.




  • PDL::LiteF
  • PDL::Stats
  • Statistics::Multtest


# 0. Set up working directory for Perl library (Using Perl version 5.18 as an example)
export PERL5LIB=/usr/local/lib/perl/5.18

# 1. Install cpanm
cpan App::cpanminus
cpanm PDL::LiteF
cpanm PDL::Stats

# 2. Install GSL (Using GSL version 2.4 as an example)
tar zxvf gsl-2.4.tar.gz
cd gsl-2.4
make install
cd ..

# 3. Install PDL::GSL
cpanm PDL::GSL::CDF
cpanm Statistics::Multtest

PSI-Sigma on Windows OS

PSI-Sigma has been tested in Linux and Mac OS environment. You can install Linux bash shell on Windows to run PSI-Sigma.

Gene Expression Analysis for nanopore long-read RNA-seq

To use the

perl ~/PSIsigma/ Homo_sapiens.GRCh38.87.sorted.gtf Experiment.Aligned.sortedByCoord.out.bam

The default setting is using 4 CPUs to calculate gene expression levels by matching constitutive exons in the gene annotation.



  • Lin KT, Ma WK, Scharner J, Liu YR, Krainer AR. 2018. A human-specific switch of alternatively spliced AFMID isoforms contributes to TP53 mutations and tumor recurrence in hepatocellular carcinoma. Genome Res doi:10.1101/gr.227181.117.

Commercial Use


