Trim
The directory raw-fastq
contains untrimmed fastq data. So first step is to clean the read data of adapter contamination and low quality bases. We will use the trim
command, which is a wrapper for another tool written by Faircloth which is called Illumiprocessor. To run this command we need to specify a directory with the raw reads and a csv file with adapters and barcode information, use raw-fastq
and sheet.csv
.
You can prepare a table as follow and save as sheet.csv file.
Sample | i7_Barcode | i5_Barcode | i7_Barcode | i5_Barcode |
---|---|---|---|---|
Bali-B43 | TACAGC | AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC*ATCTCGTATGCCGTCTTCTGCTTG | AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT | |
BTC580-B44 | TATAAT | |||
DOT6948-B46 | TCCCGA |
After preparing your sheet.csv you are ready to run the following command line from UCEasy:
$ uceasy trim raw-fastq/ sheet.csv
It will create the directory clean-fastq
with the following structure:
clean-fastq
├── Bali-B43
│ ├── adapters.fasta
│ ├── raw-reads
│ │ ├── Bali-B43-READ1.fastq.gz -> /home/caio/uceasy-tutorial/raw-fastq/Bali-B43_S43_L001_R1_001.fastq.gz
│ │ └── Bali-B43-READ2.fastq.gz -> /home/caio/uceasy-tutorial/raw-fastq/Bali-B43_S43_L001_R2_001.fastq.gz
│ ├── split-adapter-quality-trimmed
│ │ ├── Bali-B43-READ-singleton.fastq.gz
│ │ ├── Bali-B43-READ1.fastq.gz
│ │ └── Bali-B43-READ2.fastq.gz
│ └── stats
│ └── Bali-B43-adapter-contam.txt
├── BTC580-B44
│ ├── adapters.fasta
│ ├── raw-reads
│ │ ├── BTC580-B44-READ1.fastq.gz -> /home/caio/uceasy-tutorial/raw-fastq/BTC580-B44_S44_L001_R1_001.fastq.gz
│ │ └── BTC580-B44-READ2.fastq.gz -> /home/caio/uceasy-tutorial/raw-fastq/BTC580-B44_S44_L001_R2_001.fastq.gz
│ ├── split-adapter-quality-trimmed
│ │ ├── BTC580-B44-READ-singleton.fastq.gz
│ │ ├── BTC580-B44-READ1.fastq.gz
│ │ └── BTC580-B44-READ2.fastq.gz
│ └── stats
│ └── BTC580-B44-adapter-contam.txt
...
The important thing here is the split-adapter-quality-trimmed
directory with the actual files that will be consumed by the assemblers.