Skip to content

Treat Fastqs Program

sarpiens edited this page Jul 11, 2023 · 8 revisions

Description

The Treat Fastqs program allows different treatment operations to be performed on the downloaded fastq files based on the treatment information provided by a Treatment Template file. This program corresponds to the Optional Programs group, which means that this step could be skipped if there is no need for further treatment of your fastq files.

The provided fastq files can be treated in three different modes:

  • Merge Mode. This treatment will merge the samples with the same sample name, generating a new merged file in the Output Directory, if the combination of fastq types is a permitted configuration. It must be indicated with “merge” in the treatment column. Permitted configurations for this mode are:

    1. An equal number of PAIRED and SINGLE Fastq files with more than 1 file per Fastq type [Number of pair1(s) > 1; Number of pair2(s) > 1; Number of single(s) > 1; Number of pair1(s) = Number of pair2(s) = Number of single(s)].
    2. An equal number of PAIRED Fastq files with more than 1 file per fastq type [Number of pair1(s) > 1; Number of pair2(s) > 1; Number of single(s) = 0; Number of pair1(s) = Number of pair2(s)].
    3. More than one SINGLE Fastq file [Number of pair1(s) = 0; Number of pair2(s) = 0; Number of single(s) > 1].
  • Rename Mode. This treatment will change the file name for the indicated value in the sample name column generating a new file in the Output Directory, if the combination of fastq types is a permitted configuration. It must be indicated with “rename” in the treatment column. Permitted configurations for this mode are:

    1. A pair of PAIRED Fastq files with a unique SINGLE Fastq file [Number of pair1(s) = 1; Number of pair2(s) = 1; Number of single(s) = 1].
    2. A pair of PAIRED Fastq files [Number of pair1(s) = 1; Number of pair2(s) = 1; Number of single(s) = 0].
    3. A unique SINGLE Fastq file [Number of pair1(s) = 0; Number of pair2(s) = 0; Number of single(s) = 1].
  • Copy Mode. This treatment will copy the specified fastq file to the Output Directory, ignoring the value in the sample name column. It must be indicated with “copy” in the treatment column.

For instance, if we had the following PROJECT_treatment_template.tsv:

sample_name fastq_file_name fastq_type treatment
Sample0 ERR12233.fastq.gz single copy
Sample1 ERR12234.fastq.gz single rename
Sample2 ERR12235.fastq.gz single merge
Sample2 ERR12236.fastq.gz single merge

The program would perform the following treatment on the fastq files:

Input Elements:

Input Type Description
PROJECT_treatment_template.tsv File Final Curated Treatment Template
/directory/path/input Directory Downloaded Fastqs Directory
/directory/path/output Directory Treated Fastqs Directory

Output Elements:

Output Type Description
sample.fastq.gz Files Various Treated Fastq Files

The resulting files are the final treated fastq files. To get a general idea of the optional treatment steps of the workflow, check the workflow's diagram.

Arguments

Usage:

treat_fastqs [-h] -t TREATMENT_TEMPLATE -i INPUT_DIRECTORY -o OUTPUT_DIRECTORY 
             [-p FASTQ_PATTERN] [-r1 R1_PATTERN] [-r2 R2_PATTERN] [-x] [-v]

Options:

Parameter Description
-h, --help Show help message and exit.
-t, --treatment_template Treatment Template [Expected sep=TABS]. Indicate the path to the Treatment Template file.
-i, --input_directory Input Directory. Indicate the path to the Input Directory with the Fastq files to treat.
-o, --output_directory Output Directory. Indicate the path to the Output Directory to save the resulting treated Fastq files.
-p, --fastq_pattern Fastq File Pattern (Optional) [Default:".fastq.gz"]. Indicate the pattern to identify Fastq files.
-r1, --r1_pattern R1 File Pattern (Optional) [Default:"_1.fastq.gz"]. Indicate the pattern to identify R1 PAIRED Fastq files.
-r2, --r2_pattern R2 File Pattern (Optional) [Default:"_2.fastq.gz"]. Indicate the pattern to identify R2 PAIRED Fastq files.
-x, --plain_text Plain Text Mode (Optional). If indicated, it will enable Plain Text mode, and text will appear without colors.
-v, --version Show program's version number and exit.

Examples

Commands:

  • Treat Fastqs with colored text stdout:
treat_fastqs -t treatment_template_filtered_PRJEB10949_merged_metadata_example.tsv -i downloads -o treated_files
  • Treat Fastqs with plain text stdout:
treat_fastqs -t treatment_template_filtered_PRJEB10949_merged_metadata_example.tsv -i downloads -o treated_files --plain_text
  • Treat Fastqs using "fq.gz" instead of the default "fastq.gz" Fastq Pattern:
treat_fastqs -t treatment_template_PROJECT_metadata_files_other_fastq_extension.tsv -i downloads -o treated_files -p ".fq.gz" -r1 "_1.fq.gz" -r2 "_2.fq.gz"

To see a full and detailed example of dataset curation, see the Tutorial Full Example page. Particularly recommended in this case.