### Step 1: Combine Flow Cell Data

Concatenate FASTQ files from both flow cells into a single file. Check basic statistics like total read count
and file size to confirm successful combination.

In [None]:
!cd reads && sh sample_reads.sh

Writes 1 million subsampled reads to `/data/groups/wheelenj/sequencing/20250916_M009242/planaria_test_subset.fastq.gz`

### Step 2: Initial Quality Assessment

Run NanoPlot on the combined dataset to assess read length distribution, quality scores, and overall data
characteristics. Optionally check for adapter contamination using Porechop.

In [None]:
!mkdir fastq_planaria
!mkdir fastq_planaria/qc
!NanoPlot --fastq /data/groups/wheelenj/sequencing/20250916_M009242/planaria_test_subset.fastq.gz -o fastq_planaria/qc

In [None]:
!multiqc fastq_planaria/.

### Step 3: Read Filtering and Subsampling

Use Filtlong to filter reads based on quality and length, targeting 100-150 GB of high-quality data
(approximately 50-100x coverage). Filter parameters: minimum length 1000 bp, minimum mean quality 8,
keep best 20-25% of reads.

In [None]:
!mkdir trimmed_fastq
!./tools/Filtlong/bin/filtlong --min_length 1000 --keep_percent 80 --min_mean_q 8 /data/groups/wheelenj/sequencing/20250916_M009242/planaria_test_subset.fastq.gz | gzip > trimmed_fastq/trimmed_reads.fastq.gz

### Step 4: Post-Filtering Quality Control

Run NanoPlot on the combined dataset to assess read length distribution, quality scores, and overall data
characteristics. Optionally check for adapter contamination using Porechop.

In [None]:
!NanoPlot --fastq trimmed_fastq/trimmed_reads.fastq.gz -o trimmed_fastq/qc --plots kde hex

In [None]:
!python tools/Flye/bin/flye --help