Home

Welcome to the Berlin 2017 Trinity Workshop Wiki!

Visit http://34.223.228.45/ws.html to access our AWS computational resources.

Agenda

Day	Time	Activities
Monday, June 12	morning	Workshop Introduction -slides-
		Exploring the Computational Infrastructure -slides-
	afternoon	Unix command-line review
		Data overview and setup
		Using FASTQC and Trimmomatic -slides-

Tuesday, June 13	morning	Trinity de novo transcriptome assembly -slides-
	afternoon	Uploading own data or identifying and downloading SRA studies of interest -slides-

Wednesday, June 14	morning	Expression quantification -slides-
		Quality assessment for assembly -slides-
	afternoon	QC samples and replicates

Thursday, June 15	morning	Statistical methods for differential expression analysis -slides-
	afternoon	Transcript clustering and expression profiling
		Methods for functional annotation -slides-
		Trinotate and TrinotateWeb

Friday, June 16	morning	Functional enrichment analysis
		Review and custom data analyses
		Comments on software installations for later use on different resources

Misc notes

Downloading files from SRA:

The command below will generate an 'interleaved' fastq file, where record 1 is followed immediately by record 2, and we'll extract the top 1M read pairs (which = 8M top lines due to the interleaving).

below, we retrieve SRA accession: SRR390728 and use the -X 5 parameter to simply limit the number of reads in this small example. When you run it with your sample of interest, use your SRR-value and do not use the -X 5 parameter.

% fastq-dump --defline-seq '@$sn[_$rn]/$ri' --split-files -X 5 -Z  SRR390728 | \
       head -n8000000 | gzip > SRR390728.interleaved.fastq.gz

Now, to de-interleave and generate the two separate fastq files for the 'left' and 'right' read mates, we can do the following:

% gunzip -c SRR390728.interleaved.fastq.gz | \
  paste - - - - - - - - | \
  tee >(cut -f 1-4 | tr '\t' '\n' | gzip > SRR390728_1.fastq.gz) | \
  cut -f 5-8 | tr '\t' '\n' | gzip -c > SRR390728_2.fastq.gz

above is adapted from: https://biowize.wordpress.com/2015/03/26/the-fastest-darn-fastq-decoupling-procedure-i-ever-done-seen/

Just for fun, here's an example where the entire process is piped together (most complicated command I've every written in linux)

% fastq-dump --defline-seq '@$sn[_$rn]/$ri' --split-files -X 5 -Z  SRR390728 | \
  paste - - - - - - - - | \
  head -n1000000 | \
  tee >(cut -f 1-4 | tr '\t' '\n' | gzip > reads-1.fastq.gz) | \
  cut -f 5-8 | tr '\t' '\n' | gzip -c > reads-2.fastq.gz

Windows-users:

"Get that linux feeling ... on windows" cygwin: https://www.cygwin.com/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly