No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

TACOCAT version 0.0.1 - A pipeline for de novo transcriptome assembly.

Available from:

TACOCAT - Trim And COnstruct Combined Assembly Transcriptome

TACOCAT was built to streamline the de novo transcriptome assembly process and create a robust reference for downstream analysis. This pipeline starts with raw paired-end Illumina reads in fastq format, quality trims the reads, assembles the reads using multiple assemblers and different k-mer sizes, and outputs a reference classified into primary and alternate sets of transcripts.


TACOCAT runs as a single command that takes as input paired-end Illumina fastq files, two per species, outputs a number of directories containing intermediate files from each step of the pipeline, and provides the final reference transcriptome classified into primary and alternate transcripts. Classification is based on CDS-dna local alignment identity. Perfect fragment CDS are dropped, those with some CDS base differences are kept, with the longest CDS as primary transcript. Alternates are any alternately spliced transcripts.

INPUT - two fastq files. One file for forward reads and one file for reverse reads. If you have multiple forward and reverse read files from replicate conditions or sequencing runs, combine these together before using the pipeline.

If you want to run the pipeline using multiple processors in parallel, modify config.yaml to specify the number of processors you wish to use.

Installing Programs and Dependencies

TACOCAT pipeline is written to run on linux using the Snakemake workflow management system and requires the following programs and dependencies to be installed and in the system path:

  1. Python 3.x

  2. Snakemake

  3. Trimmomatic v.0.36

  4. Trinity

  5. Velvet v.1.2.10

  6. Oases v.0.2.08

  7. EvidentialGene

  8. bowtie

  9. fastanrdb

  10. cd-hit, cd-hit-est

  11. BLAST+

Brief instructions are given below although users may wish to refer to the installation notes provided with these packages for more detailed instructions.



Sourcecode and documentation can be found at


Binary sourcecode and documentation can be found at Trimmomatic requires Java to access and run jar files.


Sourcecode and documentation can be found at


Sourcecode and manual can be found at


Sourcecode and manual can be found at


Sourcecode can be found at Additional documentation about the program can be found at TACOCAT uses (/scripts/rnaseq/) and (/scripts/prot/) scripts.


Sourcecode and documentation can be found at


Part of the Exonerate package. Sourcecode and documentation can be found at

cd-hit, cd-hit-est

Sourcecode and documentation can be found at


Executables are found here (instructions are currently at and in more detail in the BLAST+ user manual). As websites can change, an alternative is to search online for "install BLAST+".

  1. Instructions are provided for installing BLAST+ on various flavors of linux on the 'Standalone BLAST setup for Unix' page of the BLAST+ Help manual currently at
  2. Follow the instructions under "Configuration" in the BLAST+ help manual to add BLAST+ to the PATH environment variable.

Setting up and running TACOCAT

Once the required programs and dependencies have been installed, TACOCAT can be setup and run on the small example data included in the package as follows:

  1. Save TACOCAT_v0.0.1.tar.gz
  2. Open a terminal and cd to the directory where TACOCAT_v.0.0.1.tar.gz was saved
  3. tar xzf TACOCAT_v.0.0.1
  4. cd TACOCAT_v.0.0.1
  5. snakemake EviGene/{sample}.evigene.fasta

The command for running TACOCAT on the example dataset is:

snakemake EviGene/{A,B}.evigene.fasta