RNA Seq Read Representation by Trinity Assembly
#Assessing the Read Content of the Transcriptome Assembly
Assembled transcripts might not always fully represent properly paired-end reads, as some transcripts may be fragmented or short and only one fragment read of a pair may align. Simply aligning reads to your transcriptome assembly using bowtie or STAR will only capture the properly paired reads. To assess the read composition of our assembly, we want to capture and count all reads that map to our assembled transcripts, including the properly paired and those that are not.
In order to comprehensively capture read alignments, we run the process below. Bowtie is used to align each fragment end to the transcriptome assembly separately. Subsequently, the read pairs are grouped into properly paired reads where possible, and those reads that do not map as properly paired are still retained.
$TRINITY_HOME/util/bowtie_PE_separate_then_join.pl --seqType fq \
--left left.fq --right right.fq \
--target Trinity.fasta --aligner bowtie \
-- -p 4 --all --best --strata -m 300 # following -- are params that get tacked onto the bowtie command.
As usual, if you have strand-specific RNA-Seq data, indicate this with the '--SS_lib_type' parameter, and put this parameter before the '--' above, since all the parameters after '--' are applied to the bowtie aligner.
An output directory 'bowtie_out' is created and should include the files:
bowtie_out.nameSorted.bam : alignments sorted by read name
bowtie_out.coordSorted.bam : alignments sorted by coordinate.
To get alignment statistics, run the following on the name-sorted bam file:
$TRINITY_HOME/util/SAM_nameSorted_to_uniq_count_stats.pl bowtie_out/bowtie_out.nameSorted.bam
#read_type count pct
proper_pairs 23383 79.62
improper_pairs 3676 12.52 (left and right reads align, but to different contigs due to fragmentation)
left_only 1199 4.08
right_only 1112 3.79
Total aligned rnaseq fragments: 29370
A typical Trinity transcriptome assembly will have the vast majority of all reads mapping back to the assembly, and ~70-80% of the mapped fragments found mapped as proper pairs.
- Trinity Wiki Home
- Installing Trinity
- Running Trinity
- Trinity process and resource monitoring
- Output of Trinity Assembly
- Assembly Quality Assessment
- Downstream Analyses
- Miscellaneous additional functionality that may be of interest
- Contributing code
- Trinity Tidbits
- Frequently Asked Questions (FAQ)
- There are too many transcripts! What do I do?
- How to minimize RAM usage
- How do I use reads I downloaded from SRA
- How do I identify the specific reads that were incorporated into the transcript assemblies?
- How can I perform cross-species analysis?
- How do I combine PE and SE reads?
- How can I run this in parallel on a computing grid?
- Computing and Time requirements
- Errors during Trinity run
- Killing Trinity
- Contact us