Skip to content

zhouyulab/igia

Repository files navigation

Integrative Gene Isoform Assembler (IGIA)

Contents

Overview

Currently there are multiple high-throughput sequencing techniques for transcriptome profiling. The next generation sequencing (NGS) based RNA-seq which generates millions of short reads, is often used for gene expression profiling, but it doesn't have the capability to identify accurate full-length transcripts, not mentioning potential amplification biases introduced during library construction. Pacbio sequencing offers long reads, with average read lengths over 10 kb but is hindered by lower throughput, higher error rate (11%-15%) and larger cost. We devised a computational pipeline named Integrative Gene Isoform Assembler (IGIA) to reconstruct accurate gene structures from improved Pacbio long reads with ssRNA-seq correction, and TSS/TES boundary information.

Repo contents

  • igia: Python package code.
  • docs: IGIA package documentation.
  • tests: Python unit tests written using the unittest package.

System requirements

Hardware requirements

The IGIA package can run on a standard computer or server cluster. For single-process mode, we recommend a computer with more than 32 GB RAM. For the MPI mode, we recommend preparing 16 GB of memory for each core.

Software Requirements

OS requirements

The package has been tested on the following Linux and MacOS operating systems.

  • Linux: Ubuntu 16.04
  • Linux: Red Hat 4.8.3
  • MacOS: MacBook Pro with MacOS High Sierra

Installation guide

Before setting up the IGIA package, users should have python version 3.5.2 or higher, and several packages installed from PyPi. Here, we recommend to use Conda to install and use IGIA.

Download IGIA

git clone https://github.com/zhouyulab/igia.git path/to/igia

replacing the 'path/to/' with the path to your local copy of the repo.

Prepare the virtual environment

IGIA is implemented in Python, and depends on several packages. With the installation and activation of virtual environment (with Conda) as shown below, you can ensure that the tools run properly.

Step 1: Download Miniconda3

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

Step 2: Create a new Python3 environment

conda create -n igia python=3.5

Install IGIA

Software dependencies

You can use following way to install the dependencies first, then install IGIA package. The typical install time on a "normal" desktop computer is several minutes, depending on the network speed.

cd path/to/igia
source activate igia
pip install -r requirements.txt
python setup.py install

To test the package code, you can run unit tests in tests/test_*.py as below.

python setup.py test

Currently there are 115 tests. If any test fails, please create an issue or contact us by email.

Demo

If you have successfully installed IGIA, you can use the following command to run IGIA on test data.

cd /path/to/igia/tests
bash ./demo.sh

The expected run time for demo on a "normal" desktop computer is about 4 minutes, and the results from IGIA will be generated in /path/to/igia/tests/igia_demo.

The expected output, as in tests/igia_demo_expected, include several iso*.bed12 files, a set of assembled transcripts in BED12 format, and 4 *.bed6 files for different genomic elements identified.

For IGIA assembled transcripts, isoF class transcripts are the most reliable isoforms. For details, please refer IGIA manuscript.

Note: TSS and TES will use the annotated sites and the sites predicted by TGS data. See the *.bed6 file for the data set used for transcript reconstruction.

Instructions for use

To run IGIA with single-threaded mode, you can execute:

source activate igia
igia --tgs tgs1.bam --tss tss.csv --tes tes.csv --ngs ngs1.bam ngs2.bam -o igia_res

See /path/to/igia/tests/example.sh for example usage on a full list of data sets in this study. Each BAM file is a small subset of reads. The expected output files are in tests/igia_res_expected.

To run IGIA with MPI mode in a cluster, you must first ensure that Openmpi/Mpich is installed and already configured in the cluster. Then you can execute:

source activate igia
mpirun -genv I_MPI_DEVICE ssm -n 8 igiampi \
  --tgs tgs1.bam --tss tss.csv --tes tes.csv --ngs ngs1.bam ngs2.bam -o igia_res

Releases

No releases published

Packages

No packages published