Skip to content
asmariyaz23 edited this page Jun 17, 2016 · 34 revisions

Welcome to the protocol for Trinity RNA-seq Analysis in Galaxy

Here we have put together the stepwise protocol for trinity RNA-seq analysis in Galaxy instance.

To install the tool in local galaxy:

Log in to your local Galaxy instance as admin or use the publicly available instance at https://toolshed.g2.bx.psu.edu/ and search for 'trinityrnaseq_protocol' in 'Tool sheds' (right side)-> 'Search and browse tool sheds'.

Figure 1_Search_toolshed

Install the tool by clicking 'Install to Galaxy'.

Figure 2_Search_toolshed2

Add 'trinityrnaseq_protocol' in 'Add new tool panel selection'

Figure 3_Upload_tool

This will install the 'trinityrnaseq_protocol' in your local galaxy instance.

Figure 4_Galaxy_interface

Trinity RNA-seq analysis:

Upload the data:

Upload data to analyze. This can be done by going under the Get data option > add the data using Upload file.

Figure 5_upload_data

Join the data:

To generate a reference assembly that we can later use for analyzing differential expression, first combine the read data sets for different conditions together into a single target for Trinity assembly. Combine the left reads and right reads of paired ends separately like so:

Under the ‘trinityrnaseq_protocol’ tool > use ‘Concatenate Datasets’ to join the datasets as shown below: Join Sp_ds.left.fq, Sp_hs.left.fq, Sp_log.left.fq, Sp_plat.left.fq, by clicking 'add new dataset' and then execute.

Figure 6_Join_data

Now, rename the file to All_left.fq by clicking the edit attribute (pencil) in the output in history tab.

Do the same for Right datasets as well.

De novo assembly of reads using Trinity:

To run the trinity, use All_left.fq as Left reads and All_Right.fq as Right reads and then execute

figure 8_Trinity

This will generate two files:

  1. Trinity on data: Assembled transcripts
  2. Trinity on data: log

To view the top few lines of the assembled transcript fasta file, you can click on the view panel (eye) in the output in history tab

RSEM abundance estimation:

This step uses the trinity assembly generated in the last step 'Assembled transcripts' to generate transcript quantification for genes and isoforms. Use 'Assembled transcripts' as 'transcripts_fasta' & left and right reads as strand reads.

RSEM generates two files: Gene counts and isoform counts. Now, rename the file to 'RSEM_abundance_estimation_LeftRight_DS: Gene counts' by clicking the edit attribute (pencil) in the output in history tab.

Run RSEM on each of the remaining three samples.

figure 10_RSEM

Abundance Estimation to Matrix:

Next, join RSEM-computed gene or isoform fragment counts into a matrix file to run edgeR which identifies differentially expressed transcripts in next few steps. We can join the RSEM-computed genes like so:

figure 11_abundance_estimates

Result from this step is a counts matrix file with name 'abundance_estimation_to_matrix: counts_matrix'. Rename the file to 'abundance estimation to matrix_DS_HS_log_Plat: Counts_matrix'

Examine the contents by clicking the view panel.

Differential Expression Using EdgeR:

To run edgeR and identify differentially expressed transcripts, we need a 'abundance_estimation_to_matrix: counts_matrix' which serves as an input to 'Matrix of RNA-Seq fragment counts for transcripts per condition'. Assembled transcripts such as Trinity_leftRight_Assembled transcripts can be used as input to'Transcripts fasta file corresponding to matrix'. We can keep the 'dispersion value' as such.

figure 12_DE_edgeR

This will generate a 'EdgeR_tar.gz' file containing volcano plots for the reads and .Rscripts for the next step of the analysis.

Download 'EdgeR_tar.gz' file, which has 'EdgeR_results' folder containing the volcano plots.

Extracting differentially expressed transcripts and generating heatmaps:

Extract those differentially expressed (DE) transcripts that are at least 4-fold differentially expressed at a significance of <= 0.001 in any of the pairwise sample comparisons.

Here, use 'EdgeR_tar.gz' file as a input to 'EdgeR tar gz file' and 'abundance estimation to matrix_DS_HS_log_Plat: Counts_matrix' file as a input to 'TMM Normalized FPKM matrix' option. Use 0.001 in 'P-value' option and 2 in 'C-value' option, as shown below:

Figure 13_DE_transcripts

This will generate four files, ‘matrix file’, ‘sample_correlation_matrix’, ‘sample_correlation_matrix_pdf’, and ‘heatmap’. To view them, click on the view panel, which downloads file to your local drive.