Skip to content
Brian Haas edited this page Jun 16, 2017 · 27 revisions

Welcome to the Berlin 2017 Trinity Workshop Wiki!

Introductory slides from Carlo Pecoraro

Visit http://34.223.228.45/ws.html to access our AWS computational resources.

Agenda

Day Time Activities
Monday, June 12 morning Workshop Introduction -slides-
Exploring the Computational Infrastructure -slides-
afternoon Unix command-line review
Data overview and setup
Using FASTQC and Trimmomatic -slides-
Tuesday, June 13 morning Trinity de novo transcriptome assembly -slides-
afternoon Uploading own data or identifying and downloading SRA studies of interest -slides-
Wednesday, June 14 morning Expression quantification -slides-
Quality assessment for assembly -slides-
afternoon QC samples and replicates
Thursday, June 15 morning Statistical methods for differential expression analysis -slides-
afternoon Transcript clustering and expression profiling
Methods for functional annotation -slides-
Trinotate and TrinotateWeb
Friday, June 16 morning Functional enrichment analysis
Review and custom data analyses
Comments on software installations for later use on different resources

Misc notes

Downloading files from SRA:

The command below will generate an 'interleaved' fastq file, where record 1 is followed immediately by record 2, and we'll extract the top 1M read pairs (which = 8M top lines due to the interleaving).

below, we retrieve SRA accession: SRR390728 and use the -X 5 parameter to simply limit the number of reads in this small example. When you run it with your sample of interest, use your SRR-value and do not use the -X 5 parameter.

% fastq-dump --defline-seq '@$sn[_$rn]/$ri' --split-files -X 5 -Z  SRR390728 | \
       head -n8000000 | gzip > SRR390728.interleaved.fastq.gz

Now, to de-interleave and generate the two separate fastq files for the 'left' and 'right' read mates, we can do the following:

% gunzip -c SRR390728.interleaved.fastq.gz | \
  paste - - - - - - - - | \
  tee >(cut -f 1-4 | tr '\t' '\n' | gzip > SRR390728_1.fastq.gz) | \
  cut -f 5-8 | tr '\t' '\n' | gzip -c > SRR390728_2.fastq.gz

above is adapted from: https://biowize.wordpress.com/2015/03/26/the-fastest-darn-fastq-decoupling-procedure-i-ever-done-seen/

Just for fun, here's an example where the entire process is piped together (most complicated command I've every written in linux)

% fastq-dump --defline-seq '@$sn[_$rn]/$ri' --split-files -X 5 -Z  SRR390728 | \
  paste - - - - - - - - | \
  head -n1000000 | \
  tee >(cut -f 1-4 | tr '\t' '\n' | gzip > reads-1.fastq.gz) | \
  cut -f 5-8 | tr '\t' '\n' | gzip -c > reads-2.fastq.gz

Windows-users:

"Get that linux feeling ... on windows" cygwin: https://www.cygwin.com/