genome_guided_Trinity
If your RNA-Seq sample differs sufficiently from your reference genome and you'd like to capture variations within your assembled transcripts, you might consider performing a genome-guided de novo assembly. Trinity can accept a bam file containing genome-aligned rna-seq reads as input. Reads are partitioned into coverage groups along the reference genome and each read cluster is assembled using the standard Trinity de novo assembly. Here, de novo assembly is restricted to only those reads that map to the genome. The advantage is that reads that share sequence in common but map to distinct parts of the genome will be targeted separately for assembly. The disadvantage is that reads that do not map to the genome will not be incorporated into the assembly. Unmapped reads can, however, be targeted for a separate genome-free de novo assembly.
Run genome-guided Trinity leveraging our hisat2-aligned reads like so:
% $TRINITY_HOME/Trinity --genome_guided_bam alignments.hisat2.bam \
--CPU 2 --max_memory 1G --genome_guided_max_intron 5000
Once Trinity completes, you'll once again a trinity_out_dir/ in your new workspace, and in this case it'll contain the resulting assembly as 'trinity-GG.fasta'.
Examine this 'trinity-GG.fasta' file:
% less trinity_out_dir/Trinity-GG.fasta
.
>TRINITY_GG_1_c0_g1_i1 len=495 path=[1:0-494] [-1, 1, -2]
AGTTATTCAAGTTGTAAAAGGTTATACAATAATTTAACAACTACCTTTTTTATTCTGTCG
GGTTACTGACCTCACTTTATGTAAATACTTCGCATGACAAATTCAGTAACTCGTCTATTT
CAGCATGCATAAGACTTTTCACTAGGGAAACTGATAAAGCTTGAGTCAACTAAATCTGCC
TTCATACTTTATCAAGGGGAACCAAGCCTGCTGTGCTTACATCAGCATCTGGAAGACTTT
CCTCTCCTCTAATCTGTGTACACATCTCCAAGCAAGGAAGAAAAAACAAACTCTGCTCAG
ACGCCTATGAAACACCTGAATGAACTTTGATGAAGTACAGTCTGAGTTACCATCATGCAC
AAGTAGAACTGCTCTTGGACTTGTTTTCCTGTTGTTTGTGGAACCTACGCGTTTGAATGG
CTTGAACGTTGCATCTTTTAAAGTTATTTTTTAAGGGTTCTTGGCATTTATCCTAGTTGT
CCGTGTTTGGCAATG
>TRINITY_GG_2_c0_g1_i1 len=227 path=[1:0-226] [-1, 1, -2]
TAGAGGAGAAAATTTCTATGGTCTAGATATTACTTGTAAAGACATGAGAAACCTGAGGTT
CGCTTTGAAACAGGAAGGCCACAGCAGAAGAGATATGTTTGAGATCCTCACGAGATACGC
GTTTCCCCTGGCTCACAGTCTGCCATTATTTGCATTTTTAAATGAAGAAAAGTTTAACGT
GGATGGATGGACAGTTTACAATCCAGTGGAAGAATACAGGATGCCGG
We'll use the GMAP software to align the Trinity transcripts to our reference genome. Trinity contains a utility that facilitates running GMAP, which first builds an index for the target genome followed by running the gmap aligner:
% ${TRINITY_HOME}/util/misc/process_GMAP_alignments_gff3_chimeras_ok.pl \
--genome minigenome.fa \
--transcripts trinity_out_dir/Trinity-GG.fasta \
--SAM | samtools view -Sb | samtools sort -o trinity-GG.gmap.bam
Index the bam file and import it into IGV to view alongside the aligned reads and the stringtie transcripts.
How do the Trinity-reconstructed transcripts compare to StringTie?
- Intro to transcriptomics and assembly
- Genome-free de novo transcriptome assembly using Trinity
- Genome-guided trancriptome assembly using StringTie and Trinity
- RNA-Seq tools and resources