Skip to content
Branch: master
Clone or download
Latest commit fc288af Apr 23, 2019
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE Add files via upload Feb 11, 2019 Add files via upload Feb 11, 2019 Add files via upload Feb 11, 2019 Add files via upload Apr 23, 2019 Update Jan 8, 2019 Update Apr 23, 2019
_config.yml Set theme jekyll-theme-leap-day Aug 6, 2018


Percent Spliced-In (PSI) values are commonly used to report alternative pre-mRNA splicing (AS) changes. However, previous PSI-detection methods are limited to specific types of AS events. PSI-Sigma is using a new splicing index (PSIΣ) that is more flexible, can incoporate novel junctions, and can compute PSI values of individual exons in complex splicing events.

  • PSI-Sigma is now released in obfuscated codes for review purposes.


Kuan-Ting (Woody) Lin,

Alignment files

For short-read RNA-seq data, please generate .bam, .bai and .SJ.out files by using STAR (

###This is an example for short-read RNA-seq###
STAR --runThreadN 6 \
	--outSAMtype BAM SortedByCoordinate \
	--outFilterIntronMotifs RemoveNoncanonical \
	--genomeDir ~/index/starR100H38 \
	--twopassMode Basic \
	--readFilesIn R1.fastq R2.fastq \
	--outFileNamePrefix <NAME>.
samtools index <NAME>.Aligned.sortedByCoord.out.bam

For long-read RNA-seq data, please use GMAP (

###This is an example for long-read RNA-seq###
~/gmap-2017-11-15/bin/gmap -d GRCh38 -f samse --min-trimmed-coverage=0.5 --no-chimeras -B 5 -t 6 ~/MinION_long_read.fastq > <NAME>.sam
samtools view -bS <NAME>.sam > <NAME>.bam
samtools sort <NAME>.bam -o <NAME>.Aligned.sortedByCoord.out.bam
samtools index <NAME>.Aligned.sortedByCoord.out.bam

Quick Start

Create links to the .bam, .bai, and .SJ.out files in the a folder (afolder). If you are using long-read RNA-seq data, .SJ.out files will be generated automatically since GMAP doesn't produce the file.

mkdir afolder
cd afolder
ln -s bamfolder/*.bam* .
ln -s bamfolder/*.SJ.* .

Download a .gtf file and sort the coordinates.

gzip -d Homo_sapiens.GRCh38.87.gtf.gz
(grep "^#" Homo_sapiens.GRCh38.87.gtf; grep -v "^#" Homo_sapiens.GRCh38.87.gtf | sort -k1,1 -k4,4n) > Homo_sapiens.GRCh38.87.sorted.gtf
rm Homo_sapiens.GRCh38.87.gtf

Create two files: (1) groupa.txt and (2) groupb.txt. Please put suffixes of your files in the groupa.txt or groupb.txt. For example, the suffix of a "" file is "Sequins_MixA". Groupa.txt will be compared with groupb.txt. Below is an example of processing files from TCGA (11A means normal and 01A means tumor):

#For TCGA files:
ls *-11A-*.SJ* | sed s/ > groupa.txt
ls *-01A-*.SJ* | sed s/ > groupb.txt

#Alternatively, you can just put the names of your .bam files:
echo Sequins_MixA.Aligned.sortedByCoord.out.bam > groupa.txt
echo Sequins_MixB.Aligned.sortedByCoord.out.bam > groupb.txt

Run After the .gtf file, please specify 1 for short-read RNA-seq and 2 for long-read RNA-seq. The last column is used to specify the minimum number of supporting reads for an AS event (10 is specified in the example below).

#For short-read RNA-seq (minimum 10 supporting reads for an AS event)
perl ~/PSIsigma/ Homo_sapiens.GRCh38.87.sorted.gtf PSIsigma 1 10
#For long-read RNA-seq (minimum 10 supporting reads for an AS event)
perl ~/PSIsigma/ Homo_sapiens.GRCh38.87.sorted.gtf PSIsigma 2 10

That's it. The results will be in the PSIsigma_r10_ir3.sorted.txt.




  • PDL::LiteF
  • PDL::Stats
  • Statistics::Multtest


# 0. Set up working directory for Perl library (Using Perl version 5.18 as an example)
export PERL5LIB=/usr/local/lib/perl/5.18

# 1. Install cpanm
cpan App::cpanminus
cpanm PDL::LiteF
cpanm PDL::Stats

# 2. Install GSL (Using GSL version 2.4 as an example)
tar zxvf gsl-2.4.tar.gz
cd gsl-2.4
make install
cd ..

# 3. Install PDL::GSL
cpanm PDL::GSL::CDF
cpanm Statistics::Multtest

PSI-Sigma on Windows OS

PSI-Sigma has been tested in Linux and Mac OS environment. You can install Linux bash shell on Windows to run PSI-Sigma.

Gene Expression Analysis for nanopore long-read RNA-seq

To use the

perl ~/PSIsigma/ Homo_sapiens.GRCh38.87.sorted.gtf Experiment.Aligned.sortedByCoord.out.bam

The default setting is using 4 CPUs to calculate gene expression levels by matching constitutive exons in the gene annotation.



  • Lin KT, Ma WK, Scharner J, Liu YR, Krainer AR. 2018. A human-specific switch of alternatively spliced AFMID isoforms contributes to TP53 mutations and tumor recurrence in hepatocellular carcinoma. Genome Res doi:10.1101/gr.227181.117.

Commercial Use


You can’t perform that action at this time.