# Parabricks Hands-On Workshop

#### Tutorial 4: Gene Fusion Detection Pipeline

For gene fusion detection, we will run the following analysis steps:

- RNA read alignment (for calling gene fusions from RNA-seq)
- Gene fusion detection

\We will start from downloading the reference genome and sample files, and then proceed through the analysis step-by-step.

#### GPU Monitoring

In [1]:
!nvidia-smi

Wed Sep  3 12:34:43 2025       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08             Driver Version: 535.161.08   CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla V100-SXM2-32GB           On  | 00000000:1B:00.0 Off |                    0 |
| N/A   29C    P0              68W / 300W |  11063MiB / 32768MiB |     20%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

In [None]:
# Run the command below in the terminal
### watch -n 0.5 nvidia-smi
#

#### Download Sample Dataset

In [2]:
## Download STAR-Fusion benchmark dataset prostate cancer cell line VCaP
%cd rna_data
!wget https://zenodo.org/records/13363154/files/SRR1217085_1.fastq.gz.20M.fq.gz
!wget https://zenodo.org/records/13363154/files/SRR1217085_2.fastq.gz.20M.fq.gz
%cd ..

/home/yingja1227/rna_data
--2025-09-03 12:34:53--  https://zenodo.org/records/13363154/files/SRR1217085_1.fastq.gz.20M.fq.gz
Resolving zenodo.org (zenodo.org)... 

  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


188.185.43.25, 188.185.48.194, 188.185.45.92, ...
Connecting to zenodo.org (zenodo.org)|188.185.43.25|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2158029851 (2.0G) [application/octet-stream]
Saving to: ‘SRR1217085_1.fastq.gz.20M.fq.gz.1’


2025-09-03 12:37:31 (13.2 MB/s) - ‘SRR1217085_1.fastq.gz.20M.fq.gz.1’ saved [2158029851/2158029851]

--2025-09-03 12:37:31--  https://zenodo.org/records/13363154/files/SRR1217085_2.fastq.gz.20M.fq.gz
Resolving zenodo.org (zenodo.org)... 188.185.45.92, 188.185.43.25, 188.185.48.194, ...
Connecting to zenodo.org (zenodo.org)|188.185.45.92|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2048873466 (1.9G) [application/octet-stream]
Saving to: ‘SRR1217085_2.fastq.gz.20M.fq.gz.1’


2025-09-03 12:40:00 (13.2 MB/s) - ‘SRR1217085_2.fastq.gz.20M.fq.gz.1’ saved [2048873466/2048873466]

/home/yingja1227


Download the Genome Resource Library. This was originally downloaded from here: https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB/

In [9]:
!wget https://cos.twcc.ai/pbworkshop/Ref/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play.tar.gz
!tar --strip-components=1 -xvzf GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play.tar.gz \
    -C Ref

--2025-09-03 15:04:16--  https://cos.twcc.ai/pbworkshop/Ref/GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play.tar.gz
Resolving cos.twcc.ai (cos.twcc.ai)... 203.145.219.21
Connecting to cos.twcc.ai (cos.twcc.ai)|203.145.219.21|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 33494478624 (31G) [binary/octet-stream]
Saving to: ‘GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play.tar.gz.1’


2025-09-03 15:08:51 (116 MB/s) - ‘GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play.tar.gz.1’ saved [33494478624/33494478624]

GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir/
GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir/AnnotFilterRule.pm
GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir/ref_genome.fa
GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir/__chkpts/
GRCh38_gencode_v37_CTAT_lib_Mar012021.plug-n-play/ctat_genome_lib_build_dir/__chkpts/ref_annot.gtf.ok
GRCh38_ge

In [10]:
!ls Ref

Homo_sapiens_assembly38.dict
Homo_sapiens_assembly38.fasta
Homo_sapiens_assembly38.fasta.amb
Homo_sapiens_assembly38.fasta.ann
Homo_sapiens_assembly38.fasta.bwt
Homo_sapiens_assembly38.fasta.fai
Homo_sapiens_assembly38.fasta.pac
Homo_sapiens_assembly38.fasta.sa
Homo_sapiens_assembly38.known_indels.vcf.gz
Homo_sapiens_assembly38.known_indels.vcf.gz.tbi
ctat_genome_lib_build_dir
gencode.v48.primary_assembly.annotation.gtf
gencode.v48.primary_assembly.annotation.gtf.gz
genomelib
genomelib.tar.gz


### Run RNA Alignment for Gene Fusion Detection
We will use the Parabricks `rna_fq2bam` to run RNA alignment. This is the same command as the RNA alignment step for gene expression analysis, but different options must be set. 
- `--min-chim-segment`: Minimum chimeric segment length, longer gives more specificity
- `--out-chim-format`: 1 includes commented header and 0 doesn't. This is used in the next step for STARfusion

See Tutorial 2 (Balk RNA Sequencing) for how to make the genome library.

In [11]:
!pbrun rna_fq2bam \
    --genome-lib-dir Ref/genomelib \
    --ref Ref/Homo_sapiens_assembly38.fasta \
    --in-fq rna_data/SRR1217085_1.fastq.gz.20M.fq.gz rna_data/SRR1217085_2.fastq.gz.20M.fq.gz\
    --out-bam rna_output/rna_fq2bam.bam \
    --output-dir rna_output \
    --read-files-command zcat \
    --low-memory \
    --tmp-dir tmp \
    --min-chim-segment 12 \
    --out-chim-format 1 # 1 to include commented headers for Chimeric.out.junction file

Please visit https://docs.nvidia.com/clara/#parabricks for detailed documentation



[Parabricks Options Mesg]: Automatically generating ID prefix
[Parabricks Options Mesg]: Read group created for /home/yingja1227/rna_data/SRR1217085_1.fastq.gz.20M.fq.gz and
/home/yingja1227/rna_data/SRR1217085_2.fastq.gz.20M.fq.gz
[Parabricks Options Mesg]: @RG\tID:C211TACXX.6\tLB:lib1\tPL:bar\tSM:sample\tPU:C211TACXX.6
[PB Info 2025-Sep-03 15:17:29] ------------------------------------------------------------------------------
[PB Info 2025-Sep-03 15:17:29] ||                 Parabricks accelerated Genomics Pipeline                 ||
[PB Info 2025-Sep-03 15:17:29] ||                              Version 4.4.0-1                             ||
[PB Info 2025-Sep-03 15:17:29] ||                                   star                                   ||
[PB Info 2025-Sep-03 15:17:29] ------------------------------------------------------------------------------
[PB Info 2025-Sep-03 15:17:29]  ..... star

#### Run Gene Fusion Detection

In [19]:
!pbrun starfusion \
    --output-dir rna_output/fusion_output \
    --genome-lib-dir Ref/ctat_genome_lib_build_dir \
    --chimeric-junction rna_output/Chimeric.out.junction \
    --tmp-dir tmp

Please visit https://docs.nvidia.com/clara/#parabricks for detailed documentation

[PB Info 2025-Sep-03 15:54:58] ------------------------------------------------------------------------------
[PB Info 2025-Sep-03 15:54:58] ||                 Parabricks accelerated Genomics Pipeline                 ||
[PB Info 2025-Sep-03 15:54:58] ||                              Version 4.4.0-1                             ||
[PB Info 2025-Sep-03 15:54:58] ||                                starfusion                                ||
[PB Info 2025-Sep-03 15:54:58] ------------------------------------------------------------------------------
[PB Info 2025-Sep-03 15:54:58] Running starfusion...
[PB Info 2025-Sep-03 15:54:58] Reading "ref_annot.gtf.gene_spans"...
[PB Info 2025-Sep-03 15:54:58] Reading "blast_pairs.idx"...
[PB Info 2025-Sep-03 15:54:58] Reading "trans.blast.align_coords.align_coords.dat"...
[PB Info 2025-Sep-03 15:54:58] Reading "fusion_annot_lib.idx"...
[PB Info 2025-Sep-03 15:54:58] Rea

In [20]:
!ls rna_output/fusion_output

filter.intermediates_dir
fusion_candidates.preliminary
fusion_candidates.preliminary.filtered
fusion_candidates.preliminary.filtered.FFPM
fusion_candidates.preliminary.wSpliceInfo
fusion_candidates.preliminary.wSpliceInfo.wAnnot
fusion_candidates.preliminary.wSpliceInfo.wAnnot.annot_filter.fail
fusion_candidates.preliminary.wSpliceInfo.wAnnot.annot_filter.pass
fusion_candidates.preliminary.wSpliceInfo.wAnnot.annot_filter.pass.RTartifact.filtered
fusion_candidates.preliminary.wSpliceInfo.wAnnot.annot_filter.pass.RTartifact.pass
fusion_candidates.preliminary.wSpliceInfo.wAnnot.annot_filter.pass.RTartifact.pass.brkptselect.filtered
fusion_candidates.preliminary.wSpliceInfo.wAnnot.annot_filter.pass.RTartifact.pass.brkptselect.pass
fusion_candidates.preliminary.wSpliceInfo.wAnnot.annot_filter.pass.RTartifact.pass.brkptselect.pass.minFFPM.0.1.pass
fusion_predictions.abridged.tsv
fusion_predictions.tsv
junction_breakpts_to_genes.txt
junction_breakpts_to_genes.txt.fail
junction_breakpts_to_gen