Skip to content

zsolt-balazs/LoRTIA

Repository files navigation

LoRTIA - Long-read RNA-Seq Transcript Isoform Annotator toolkit

version 0.9.9

The LoRTIA toolkit annotates transcript features such as transcriptional start sites (TSS), transcriptional end sites (TES), introns and transcript isoforms based on long-read RNA sequencing (direct RNA-Seq and cDNA-Seq) data. The toolkit expects SAM or BAM files as input and produces GFF files as well as other processed files as outputs. The sequencing reads can stem from any long-read sequencing platform (both Nanopore and PacBio reads are accepted). As long as the reads contain adapters which mark the two ends of the full-length transcripts, a complete isoform annotation is possible. The SAM or BAM files can be produced by any mapper, however when mapping with minimap2, either run minimap2 with the -Y option or filter out secondary alignments before using the LoRTIA toolkit. The toolkit was developed to be run in UNIX environments.

Contents

Installation

Clone the repository.

git clone https://github.com/zsolt-balazs/LoRTIA
cd LoRTIA
chmod 755 LoRTIA

Make sure you have all the dependencies and you will be able to run the program.

Dependencies

  • bedtools (tested v. 2.28) Make sure that bedtools is added to your PATH and can be run by the bedtools command!
  • pandas (tested v. 0.24.0 and up)
  • Biopython (tested Release 1.73)
  • pysam (tested v. 0.15.0)
  • scipy (tested v. 1.2.2)

If you have your dependencies installed, and also added LoRTIA to your PATH, then typing LoRTIA -h should show you the help menu.

Usage

The following examples will show how to determine isoforms from an alignments.sam file that was mapped to reference.fasta. The annotations and other processed files will be generated in the output_folder with the prefix of the SAM file. When running the LoRTIA pipeline, you have to specify the types of adapters that you used during your library preparation. The Lexogen Telo Prime cap selection kit, is the default adapter set of the pipeline. If your sample was prepared with this kit, simply type the following command:

LoRTIA -s poisson -f True /path/to/alignments.sam /path/to/output_folder /path/to/reference.fasta

If you prepared your library using the PacBio Isoseq, type the following command:

LoRTIA -5 AGAGTACATGGG --five_score 16 --check_in_soft 15 -3 AAAAAAAAAAAAAAA --three_score 18 -s poisson -f True /path/to/alignments.sam /path/to/output_folder /path/to/reference.fasta

If you prepared your library using standard Nanopore cDNA-Seq adapters, type the following command:

LoRTIA -5 TGCCATTAGGCCGGG --five_score 16 --check_in_soft 15 -3 AAAAAAAAAAAAAAA --three_score 16 -s poisson -f True /path/to/alignments.sam /path/to/output_folder /path/to/reference.fasta

If you prepared your library with a different set of adapters, you will have to specify those when running the program.

For more details, read the Wiki.

Credits

The LoRTIA toolkit is developed by Zsolt Balázs. Discussions and earlier work with Attila Szűcs (University of Szeged) contributed a lot of ideas to the implementation the toolkit. The early code was commented on by Tibor Nagy (University of Debrecen). Special thanks to the Department of Medical Biology at the University of Szeged for the early testing feedbacks.

About

LoRTIA toolkit alpha version

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages