Skip to content
Caio Raposo edited this page Nov 3, 2021 · 1 revision

Trim


The directory raw-fastq contains untrimmed fastq data. So first step is to clean the read data of adapter contamination and low quality bases. We will use the trim command, which is a wrapper for another tool written by Faircloth which is called Illumiprocessor. To run this command we need to specify a directory with the raw reads and a csv file with adapters and barcode information, use raw-fastq and sheet.csv.

You can prepare a table as follow and save as sheet.csv file.

Sample i7_Barcode i5_Barcode i7_Barcode i5_Barcode
Bali-B43 TACAGC AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC*ATCTCGTATGCCGTCTTCTGCTTG AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT
BTC580-B44 TATAAT
DOT6948-B46 TCCCGA

After preparing your sheet.csv you are ready to run the following command line from UCEasy:

$ uceasy trim raw-fastq/ sheet.csv

It will create the directory clean-fastq with the following structure:

clean-fastq
├── Bali-B43
│  ├── adapters.fasta
│  ├── raw-reads
│  │  ├── Bali-B43-READ1.fastq.gz -> /home/caio/uceasy-tutorial/raw-fastq/Bali-B43_S43_L001_R1_001.fastq.gz
│  │  └── Bali-B43-READ2.fastq.gz -> /home/caio/uceasy-tutorial/raw-fastq/Bali-B43_S43_L001_R2_001.fastq.gz
│  ├── split-adapter-quality-trimmed
│  │  ├── Bali-B43-READ-singleton.fastq.gz
│  │  ├── Bali-B43-READ1.fastq.gz
│  │  └── Bali-B43-READ2.fastq.gz
│  └── stats
│     └── Bali-B43-adapter-contam.txt
├── BTC580-B44
│  ├── adapters.fasta
│  ├── raw-reads
│  │  ├── BTC580-B44-READ1.fastq.gz -> /home/caio/uceasy-tutorial/raw-fastq/BTC580-B44_S44_L001_R1_001.fastq.gz
│  │  └── BTC580-B44-READ2.fastq.gz -> /home/caio/uceasy-tutorial/raw-fastq/BTC580-B44_S44_L001_R2_001.fastq.gz
│  ├── split-adapter-quality-trimmed
│  │  ├── BTC580-B44-READ-singleton.fastq.gz
│  │  ├── BTC580-B44-READ1.fastq.gz
│  │  └── BTC580-B44-READ2.fastq.gz
│  └── stats
│     └── BTC580-B44-adapter-contam.txt
...

The important thing here is the split-adapter-quality-trimmed directory with the actual files that will be consumed by the assemblers.