trinityrnaseq / GalaxyTrinityProtocol Public
Home
Pages 1
-
LoadingHome
- Welcome to the protocol for Trinity RNA-seq Analysis in Galaxy
- To install the tool in local galaxy:
- Trinity RNA-seq analysis:
- Upload the data:
- Join the data:
- De novo assembly of reads using Trinity:
- RSEM abundance estimation:
- Abundance Estimation to Matrix:
- Differential Expression Using EdgeR:
- Extracting differentially expressed transcripts and generating heatmaps:
Clone this wiki locally
Welcome to the protocol for Trinity RNA-seq Analysis in Galaxy
Here we have put together the stepwise protocol for trinity RNA-seq analysis in Galaxy instance.
To install the tool in local galaxy:
Log in to your local Galaxy instance as admin or use the publicly available instance at https://toolshed.g2.bx.psu.edu/ and search for 'trinityrnaseq_protocol' in 'Tool sheds' (right side)-> 'Search and browse tool sheds'.

Install the tool by clicking 'Install to Galaxy'.

Add 'trinityrnaseq_protocol' in 'Add new tool panel selection'

This will install the 'trinityrnaseq_protocol' in your local galaxy instance.

Trinity RNA-seq analysis:
Upload the data:
Upload data to analyze. This can be done by going under the Get data option > add the data using Upload file.

Join the data:
To generate a reference assembly that we can later use for analyzing differential expression, first combine the read data sets for different conditions together into a single target for Trinity assembly. Combine the left reads and right reads of paired ends separately like so:
Under the ‘trinityrnaseq_protocol’ tool > use ‘Concatenate Datasets’ to join the datasets as shown below: Join Sp_ds.left.fq, Sp_hs.left.fq, Sp_log.left.fq, Sp_plat.left.fq, by clicking 'add new dataset' and then execute.

Now, rename the file to All_left.fq by clicking the edit attribute (pencil) in the output in history tab.
Do the same for Right datasets as well.
De novo assembly of reads using Trinity:
To run the trinity, use All_left.fq as Left reads and All_Right.fq as Right reads and then execute

This will generate two files:
- Trinity on data: Assembled transcripts
- Trinity on data: log
To view the top few lines of the assembled transcript fasta file, you can click on the view panel (eye) in the output in history tab
RSEM abundance estimation:
This step uses the trinity assembly generated in the last step 'Assembled transcripts' to generate transcript quantification for genes and isoforms. Use 'Assembled transcripts' as 'transcripts_fasta' & left and right reads as strand reads.
RSEM generates two files: Gene counts and isoform counts. Now, rename the file to 'RSEM_abundance_estimation_LeftRight_DS: Gene counts' by clicking the edit attribute (pencil) in the output in history tab.
Run RSEM on each of the remaining three samples.

Abundance Estimation to Matrix:
Next, join RSEM-computed gene or isoform fragment counts into a matrix file to run edgeR which identifies differentially expressed transcripts in next few steps. We can join the RSEM-computed genes like so:

Result from this step is a counts matrix file with name 'abundance_estimation_to_matrix: counts_matrix'. Rename the file to 'abundance estimation to matrix_DS_HS_log_Plat: Counts_matrix'
Examine the contents by clicking the view panel.
Differential Expression Using EdgeR:
To run edgeR and identify differentially expressed transcripts, we need a 'abundance_estimation_to_matrix: counts_matrix' which serves as an input to 'Matrix of RNA-Seq fragment counts for transcripts per condition'. Assembled transcripts such as Trinity_leftRight_Assembled transcripts can be used as input to'Transcripts fasta file corresponding to matrix'. We can keep the 'dispersion value' as such.

This will generate a 'EdgeR_tar.gz' file containing volcano plots for the reads and .Rscripts for the next step of the analysis.
Download 'EdgeR_tar.gz' file, which has 'EdgeR_results' folder containing the volcano plots.
Extracting differentially expressed transcripts and generating heatmaps:
Extract those differentially expressed (DE) transcripts that are at least 4-fold differentially expressed at a significance of <= 0.001 in any of the pairwise sample comparisons.
Here, use 'EdgeR_tar.gz' file as a input to 'EdgeR tar gz file' and 'abundance estimation to matrix_DS_HS_log_Plat: Counts_matrix' file as a input to 'TMM Normalized FPKM matrix' option. Use 0.001 in 'P-value' option and 2 in 'C-value' option, as shown below:

This will generate four files, ‘matrix file’, ‘sample_correlation_matrix’, ‘sample_correlation_matrix_pdf’, and ‘heatmap’. To view them, click on the view panel, which downloads file to your local drive.