Skip to content

RNA Seq Read Representation by Trinity Assembly

Brian Haas edited this page Aug 21, 2016 · 23 revisions

#Assessing the Read Content of the Transcriptome Assembly

Assembled transcripts might not always fully represent properly paired-end reads, as some transcripts may be fragmented or short and only one fragment read of a pair may align. Simply aligning reads to your transcriptome assembly using bowtie or STAR will only capture the properly paired reads. To assess the read composition of our assembly, we want to capture and count all reads that map to our assembled transcripts, including the properly paired and those that are not.

In order to comprehensively capture read alignments, we run the process below. Bowtie2 is used to align the reads to the transcriptome and then we count the number of proper pairs and improper or orphan read alignments.

First, build a bowtie2 index for the transcriptome:

bowtie2-build Trinity.fasta Trinity.fasta

Then perform the alignment (example for paired-end reads):

bowtie2 --local --no-unal -x Trinity.fasta -q -1 left_reads.fq -2 right_reads.fq \
     | samtools view -Sb - | samtools sort -no - > bowtie2.nameSorted.bam

To get alignment statistics, run the following on the name-sorted bam file:

   $TRINITY_HOME/util/SAM_nameSorted_to_uniq_count_stats.pl bowtie2.nameSorted.bam

#read_type	count	pct
proper_pairs	23383	79.62
improper_pairs	3676	12.52  (left and right reads align, but to different contigs due to fragmentation)
left_only	1199	4.08
right_only	1112	3.79

Total aligned rnaseq fragments: 29370

A typical Trinity transcriptome assembly will have the vast majority of all reads mapping back to the assembly, and ~70-80% of the mapped fragments found mapped as proper pairs.

Clone this wiki locally