Skip to content

genome_guided_Trinity

Brian Haas edited this page Sep 9, 2017 · 1 revision

Genome-guided Trinity de novo assembly

If your RNA-Seq sample differs sufficiently from your reference genome and you'd like to capture variations within your assembled transcripts, you might consider performing a genome-guided de novo assembly. Trinity can accept a bam file containing genome-aligned rna-seq reads as input. Reads are partitioned into coverage groups along the reference genome and each read cluster is assembled using the standard Trinity de novo assembly. Here, de novo assembly is restricted to only those reads that map to the genome. The advantage is that reads that share sequence in common but map to distinct parts of the genome will be targeted separately for assembly. The disadvantage is that reads that do not map to the genome will not be incorporated into the assembly. Unmapped reads can, however, be targeted for a separate genome-free de novo assembly.

Trinity assembly from genome-aligned reads (bam file)

Run genome-guided Trinity leveraging our hisat2-aligned reads like so:

%  $TRINITY_HOME/Trinity --genome_guided_bam alignments.hisat2.bam \
       --CPU 2 --max_memory 1G --genome_guided_max_intron 5000

Once Trinity completes, you'll once again a trinity_out_dir/ in your new workspace, and in this case it'll contain the resulting assembly as 'trinity-GG.fasta'.

Examine this 'trinity-GG.fasta' file:

%  less trinity_out_dir/Trinity-GG.fasta

.

>TRINITY_GG_1_c0_g1_i1 len=495 path=[1:0-494] [-1, 1, -2]
AGTTATTCAAGTTGTAAAAGGTTATACAATAATTTAACAACTACCTTTTTTATTCTGTCG
GGTTACTGACCTCACTTTATGTAAATACTTCGCATGACAAATTCAGTAACTCGTCTATTT
CAGCATGCATAAGACTTTTCACTAGGGAAACTGATAAAGCTTGAGTCAACTAAATCTGCC
TTCATACTTTATCAAGGGGAACCAAGCCTGCTGTGCTTACATCAGCATCTGGAAGACTTT
CCTCTCCTCTAATCTGTGTACACATCTCCAAGCAAGGAAGAAAAAACAAACTCTGCTCAG
ACGCCTATGAAACACCTGAATGAACTTTGATGAAGTACAGTCTGAGTTACCATCATGCAC
AAGTAGAACTGCTCTTGGACTTGTTTTCCTGTTGTTTGTGGAACCTACGCGTTTGAATGG
CTTGAACGTTGCATCTTTTAAAGTTATTTTTTAAGGGTTCTTGGCATTTATCCTAGTTGT
CCGTGTTTGGCAATG
>TRINITY_GG_2_c0_g1_i1 len=227 path=[1:0-226] [-1, 1, -2]
TAGAGGAGAAAATTTCTATGGTCTAGATATTACTTGTAAAGACATGAGAAACCTGAGGTT
CGCTTTGAAACAGGAAGGCCACAGCAGAAGAGATATGTTTGAGATCCTCACGAGATACGC
GTTTCCCCTGGCTCACAGTCTGCCATTATTTGCATTTTTAAATGAAGAAAAGTTTAACGT
GGATGGATGGACAGTTTACAATCCAGTGGAAGAATACAGGATGCCGG

Aligning Trinity-assembled Transcripts to the Genome

We'll use the GMAP software to align the Trinity transcripts to our reference genome. Trinity contains a utility that facilitates running GMAP, which first builds an index for the target genome followed by running the gmap aligner:

% ${TRINITY_HOME}/util/misc/process_GMAP_alignments_gff3_chimeras_ok.pl \
     --genome minigenome.fa \
     --transcripts trinity_out_dir/Trinity-GG.fasta \
     --SAM | samtools view -Sb | samtools sort -o trinity-GG.gmap.bam

Index the bam file and import it into IGV to view alongside the aligned reads and the stringtie transcripts.

How do the Trinity-reconstructed transcripts compare to StringTie?