# Bioinformatics of the Central Dogma - iPython Notebook

# Objective: Perform bioinformatic analysis of DNA sequencing data using command-line tools

## Day 1: Exome Sequencing Analysis

We will use sequencing data from a family to identify potentially disease-associated mutations.

## Step 1: Set up the environment

### Install necessary bioinformatics tools using conda

In [None]:
!mamba install -c bioconda fastqc multiqc bwa samtools bcftools freebayes snpeff gemini -y

## Step 2: Load the raw sequencing data

In [None]:
# Download raw FASTQ files from Zenodo

In [None]:
!wget https://zenodo.org/record/3243160/files/father_R1.fq.gz

In [None]:
!wget https://zenodo.org/record/3243160/files/father_R2.fq.gz

In [None]:
!wget https://zenodo.org/record/3243160/files/mother_R1.fq.gz

In [None]:
!wget https://zenodo.org/record/3243160/files/mother_R2.fq.gz

In [None]:
!wget https://zenodo.org/record/3243160/files/proband_R1.fq.gz

In [None]:
!wget https://zenodo.org/record/3243160/files/proband_R2.fq.gz

In [None]:
### Step 3: Quality Control

In [None]:
# Run FastQC to evaluate the quality of the raw data

In [None]:
!fastqc father_R1.fq.gz father_R2.fq.gz mother_R1.fq.gz mother_R2.fq.gz proband_R1.fq.gz proband_R2.fq.gz -o .

In [None]:
# Aggregate FastQC results using MultiQC

In [None]:
!multiqc .

In [None]:
### Step 4: Alignment

In [None]:
# Map the sequencing data to the human reference genome (hg19) using BWA-MEM

In [None]:
!bwa mem -R '@RG\tID:000\tSM:father' hg19.fa father_R1.fq.gz father_R2.fq.gz > father.sam

In [None]:
!bwa mem -R '@RG\tID:001\tSM:mother' hg19.fa mother_R1.fq.gz mother_R2.fq.gz > mother.sam

In [None]:
!bwa mem -R '@RG\tID:002\tSM:proband' hg19.fa proband_R1.fq.gz proband_R2.fq.gz > proband.sam

In [None]:
# Convert SAM to BAM and sort BAM files

In [None]:
!samtools view -Sb father.sam | samtools sort -o father.bam

In [None]:
!samtools view -Sb mother.sam | samtools sort -o mother.bam

In [None]:
!samtools view -Sb proband.sam | samtools sort -o proband.bam

In [None]:
### Step 5: Filter Alignments

In [None]:
# Filter BAM files to retain only properly paired reads and remove duplicates

In [None]:
!samtools view -b -f 2 father.bam > father.filtered.bam

In [None]:
!samtools view -b -f 2 mother.bam > mother.filtered.bam

In [None]:
!samtools view -b -f 2 proband.bam > proband.filtered.bam

In [None]:
!samtools rmdup father.filtered.bam father.filtered.rmdup.bam

In [None]:
!samtools rmdup mother.filtered.bam mother.filtered.rmdup.bam

In [None]:
!samtools rmdup proband.filtered.bam proband.filtered.rmdup.bam

In [None]:
### Step 6: Variant Calling

In [None]:
# Use FreeBayes to call variants

In [None]:
!freebayes -f hg19.fa -b father.filtered.rmdup.bam mother.filtered.rmdup.bam proband.filtered.rmdup.bam > variants.vcf

In [None]:
### Step 7: Post-processing

In [None]:
# Normalize VCF with bcftools

In [None]:
!bcftools norm -f hg19.fa -m -any variants.vcf -o normalized_variants.vcf

In [None]:
### Step 8: Annotate Variants

In [None]:
# Annotate variants using SnpEff

In [None]:
!snpeff ann hg19 normalized_variants.vcf > annotated_variants.vcf

In [None]:
### Step 9: Create a pedigree file

In [None]:
with open('pedigree.ped', 'w') as f:


In [None]:
    f.write("#family_id\tname\tpaternal_id\tmaternal_id\tsex\tphenotype\n")


In [None]:
    f.write("FAM\tfather\t0\t0\t1\t1\n")


In [None]:
    f.write("FAM\tmother\t0\t0\t2\t1\n")


In [None]:
    f.write("FAM\tproband\tfather\tmother\t1\t2\n")

In [None]:
### Step 10: Load data into GEMINI

In [None]:
!gemini load -v annotated_variants.vcf -p pedigree.ped -t snpEff database.db

In [None]:
### Step 11: Inheritance Pattern Analysis

In [None]:
# Query GEMINI for autosomal recessive inheritance patterns

In [None]:
!gemini query -q "SELECT chrom, start, ref, alt, impact, gene, clinvar_sig, clinvar_disease_name, clinvar_gene_phenotype, max_aaf_all FROM variants WHERE impact_severity != 'LOW' AND (max_aaf_all < 0.01 OR max_aaf_all IS NULL)" -d database.db > inheritance_results.txt

In [None]:
## Day 2: Additional Analysis

In [None]:
# More detailed analysis to follow based on Day 1's output...

In [None]:
## Lab Report and Discussion

In [None]:
# Include screenshots, result analysis, and answers to discussion questions