-
Notifications
You must be signed in to change notification settings - Fork 132
Closed
Labels
Description
I have forward and reverse FASTQ files from Illumina paired-end sequencing, and I'm merging these reads using vsearch, which works great. However, I need to filter the reads based on a quality score (Qscore) threshold of 30 before merging. Specifically, within each forward and reverse FASTQ file, I want to discard any reads that have a quality score below this threshold. Once the reads are filtered, I want to perform a paired-end merge on the filtered reads.
Did I miss something? Also, is this the right way to approach the problem?
Here’s the code I’m using and the error I encountered during filtering:
def filter_and_merge_reads(r1_path, r2_path, output_dir, qscore_threshold=30, merge_override=False):
filtered_r1_path = os.path.join(output_dir, "filtered_r1.fastq")
filtered_r2_path = os.path.join(output_dir, "filtered_r2.fastq")
# Filter R1 and R2 reads
subprocess.run(f"vsearch --fastq_filter {r1_path} --fastqout {filtered_r1_path} --fastq_qmin {qscore_threshold}", shell=True, check=True)
subprocess.run(f"vsearch --fastq_filter {r2_path} --fastqout {filtered_r2_path} --fastq_qmin {qscore_threshold}", shell=True, check=True)
# Merge filtered reads
merged_output_prefix = os.path.join(output_dir, "merged")
merged_output_file = f"{merged_output_prefix}.fastq"
if not os.path.exists(merged_output_file) or merge_override:
subprocess.run(f"vsearch --fastq_mergepairs {filtered_r1_path} --reverse {filtered_r2_path} --fastqout {merged_output_file}", shell=True, check=True)
else:
print(f"Using existing merged file: {merged_output_file}")
return merged_output_file
# Example usage
r1_path = "forward.fastq.gz"
r2_path = "reverse.fastq.gz"
output_dir = '../results/'
filter_and_merge_reads(r1_path, r2_path, output_dir, qscore_threshold=30, merge_override=False)This is the error I'm seeing upon running the script:
vsearch v2.28.1_linux_x86_64, 124.5GB RAM, 16 cores
https://github.com/torognes/vsearch
Reading input file
Fatal error: FASTQ quality value (16) below qmin (30)
---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
Cell In[35], line 36
34 r2_path = "reverse.fastq.gz"
35 output_dir = '../results/'
---> 36 filter_and_merge_reads(r1_path, r2_path, output_dir, qscore_threshold=30, merge_override=False)
Cell In[35], line 19, in filter_and_merge_reads(r1_path, r2_path, output_dir, qscore_threshold, merge_override)
16 filtered_r2_path = os.path.join(output_dir, "filtered_r2.fastq")
18 # Filter R1 and R2 reads
---> 19 subprocess.run(f"vsearch --fastq_filter {r1_path} --fastqout {filtered_r1_path} --fastq_qmin {qscore_threshold}", shell=True, check=True)
20 subprocess.run(f"vsearch --fastq_filter {r2_path} --fastqout {filtered_r2_path} --fastq_qmin {qscore_threshold}", shell=True, check=True)
22 # Merge filtered reads
File /opt/conda/lib/python3.9/subprocess.py:528, in run(input, capture_output, timeout, check, *popenargs, **kwargs)
526 retcode = process.poll()
527 if check and retcode:
--> 528 raise CalledProcessError(retcode, process.args,
529 output=stdout, stderr=stderr)
530 return CompletedProcess(process.args, retcode, stdout, stderr)
CalledProcessError: Command 'vsearch --fastq_filter forward.fastq.gz --fastqout ../results/filtered_r1.fastq --fastq_qmin 30' returned non-zero exit status 1.
Reactions are currently unavailable