# Parabricks Hands-On Workshop

### Tutorial 3: Tumor Sequencing: Somatic Variant Calling Workflow

For tumor sequencing, there will be several analysis to perform:
- Read alignment of tumor sequencing sample
- Read alignment of normal sequencing sample (if doing tumor-normal paired analysis)
- Somatic variant calling
- RNA read alignment (for calling gene fusions from RNA-seq)
- Gene fusion detection

\We will start from downloading the reference genome and sample files, and then proceed through the analysis step-by-step.

#### GPU Monitoring

In [4]:
!nvidia-smi

Mon Sep  1 16:59:12 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08             Driver Version: 535.161.08   CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla V100-SXM2-32GB           On  | 00000000:3E:00.0 Off |                    0 |
| N/A   28C    P0              56W / 300W |      0MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [None]:
# Run the command below in the terminal
### watch -n 0.5 nvidia-smi
#

### Download Sample Data and References from S3
For this workshop, the files are stored in the Cloud Object Storage (COS). We will download the following files:
- Tumor sample: whole-exome sequencing of HCC1395 breast cancer cell line (SRR7890851)
- Normal sample: whole-exome sequencing of HCC1395BL B lymphocyte cell line, the match normal for HCC1395 (SRR7890850)
- Human reference genome: GRCh38 genome assembly `.fasta` file and its index files (This was already downloaded in Tutorial 1, so we won't do that again)

In [12]:
!mkdir tumor_data
%cd tumor_data
!wget https://cos.twcc.ai/pbworkshop/tumor_sample/SRR7890850_1.fastq.gz
!wget https://cos.twcc.ai/pbworkshop/tumor_sample/SRR7890850_2.fastq.gz
!wget https://cos.twcc.ai/pbworkshop/tumor_sample/SRR7890851_1.fastq.gz
!wget https://cos.twcc.ai/pbworkshop/tumor_sample/SRR7890851_2.fastq.gz
!ls
%cd ..

/home/yingja1227/tumor_data
--2025-09-02 17:25:07--  https://cos.twcc.ai/pbworkshop/tumor_sample/SRR7890850_1.fastq.gz
Resolving cos.twcc.ai (cos.twcc.ai)... 203.145.219.21
Connecting to cos.twcc.ai (cos.twcc.ai)|203.145.219.21|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1933019989 (1.8G) [application/gzip]
Saving to: ‘SRR7890850_1.fastq.gz.1’


2025-09-02 17:25:24 (110 MB/s) - ‘SRR7890850_1.fastq.gz.1’ saved [1933019989/1933019989]

--2025-09-02 17:25:24--  https://cos.twcc.ai/pbworkshop/tumor_sample/SRR7890850_2.fastq.gz
Resolving cos.twcc.ai (cos.twcc.ai)... 203.145.219.21
Connecting to cos.twcc.ai (cos.twcc.ai)|203.145.219.21|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2028180298 (1.9G) [application/gzip]
Saving to: ‘SRR7890850_2.fastq.gz.1’


2025-09-02 17:25:40 (119 MB/s) - ‘SRR7890850_2.fastq.gz.1’ saved [2028180298/2028180298]

--2025-09-02 17:25:40--  https://cos.twcc.ai/pbworkshop/tumor_sample/SRR7890851_1.fastq.gz
R

In [8]:
!mkdir tumor_output

The SeqC2 consortium did a series of studies to o establish best practices, reference standards, and benchmark the results of somatic mutation detections under different bioinformatic and laboratory conditions (https://sites.google.com/view/seqc2 paper: https://www.nature.com/articles/s41587-021-00993-6). Download a tumor-normal paired sequencing data for analsis.

### Run Somatic Workflow: BWA-MEM + Mutect2

The Somatic Workflow is one command that analyzes tumor sequencing from raw FASTQ file to somatic variants in VCF format. It performs alignment using BWA-MEM on both tumor and normal sequencing files and uses Mutect2 to generate a single VCF file containing somatic variants. If you wish to perform tumor-only analysis, just omit the `in-normal-fq` and `out-normal-bam` options.

In [8]:
#This took 48 minutes on a V100 32 GB
!pbrun somatic \
    --ref Ref/Homo_sapiens_assembly38.fasta \
    --in-normal-fq tumor_data/SRR7890850_1.fastq.gz tumor_data/SRR7890850_2.fastq.gz "@RG\tID:SRR7890850_rg1\tLB:lib_normal\tPL:ILLUMINA\tSM:normal_sample\tPU:SRR7890850_rg1" \
    --in-tumor-fq tumor_data/SRR7890851_1.fastq.gz tumor_data/SRR7890851_2.fastq.gz "@RG\tID:SRR7890851_rg1\tLB:lib_tumor\tPL:ILLUMINA\tSM:tumor_sample\tPU:SRR7890851_rg1" \
    --bwa-options="-Y" \
    --out-normal-bam tumor_output/normal.bam \
    --out-tumor-bam tumor_output/tumor.bam \
    --out-vcf tumor_output/somatic_variants.vcf \
    --low-memory \
    --mutect-low-memory \
    --num-gpus 1

Please visit https://docs.nvidia.com/clara/#parabricks for detailed documentation


[Parabricks Options Mesg]: Read group created for /home/yingja1227/tumor_data/SRR7890851_1.fastq.gz and
/home/yingja1227/tumor_data/SRR7890851_2.fastq.gz
[Parabricks Options Mesg]: @RG\tID:SRR7890851_rg1\tLB:lib_tumor\tPL:ILLUMINA\tSM:tumor_sample\tPU:SRR7890851_rg1
[Parabricks Options Mesg]: Read group created for /home/yingja1227/tumor_data/SRR7890850_1.fastq.gz and
/home/yingja1227/tumor_data/SRR7890850_2.fastq.gz
[Parabricks Options Mesg]: @RG\tID:SRR7890850_rg1\tLB:lib_normal\tPL:ILLUMINA\tSM:normal_sample\tPU:SRR7890850_rg1


[Parabricks Options Mesg]: Checking argument compatibility
[Parabricks Options Mesg]: Set --bwa-options="-K #" to produce compatible pair-ended results with previous versions of
fq2bam or BWA MEM.
[Parabricks Options Mesg]: Read group created for /home/yingja1227/tumor_data/SRR7890851_1.fastq.gz and
/home/yingja1227/tumor_data/SRR7890851_2.fastq.gz
[Parabricks Options Mesg]: 

### Run Alignment and Variant Calling Separately

The alignment and variant calling steps within the somatic workflow can be run separately. Remember to run fq2bam on the tumor fastq and normal fastqa separately (see Tutorial 1). Variant calling can be done by Mutext2 or DeepSomatic.

- Variant calling by GATK Mutect2

In [None]:
!pbrun mutectcaller \
    --ref Ref/Homo_sapiens_assembly38.fasta \
    --in-normal-bam tumor_output/normal.bam \
    --in-tumor-bam tumor_output/tumor.bam \
    --out-vcf tumor_output/somatic_variants.vcf \
    --mutect-low-memory \
    --num-gpus 1

- DeepSomatics

In [None]:
!pbrun deepsomatic \
    --ref Ref/Homo_sapiens_assembly38.fasta \
    --in-normal-bam tumor_output/normal.bam \
    --in-tumor-bam tumor_output/tumor.bam \
    --out-vcf tumor_output/somatic_variants.vcf \