# Intersecting sgRNA sequences with VCF

use bash scripts, i.e. `vcftools` and `bedtools`, to merge VCFs and intersect with the CDS-PAMs generated in step 1

### Prepare VCF file

**1. Download Gnomad and dbSNP build 151:**
- Gnomad: https://gnomad.broadinstitute.org/downloads#v2-liftover-variants
- dbSNP 151: https://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/

**2. Strip `chr` in Gnomad VCF:**
```bash
zcat gnomad.exomes.r2.1.1.sites.liftover_grch38.vcf.gz | awk '{gsub(/^chr/,""); print}' | bgzip -c  > gnomad.exomes.r2.1.1.sites.liftover_grch38.NoCHR.vcf.gz ; tabix -p vcf gnomad.exomes.r2.1.1.sites.liftover_grch38.NoCHR.vcf.gz
```

**3. Merge Gnomad and dbSNP**
```bash
vcf-merge Gnomad_V2_Exomes_hg38_liftover/gnomad.exomes.r2.1.1.sites.liftover_grch38.NoCHR.vcf.gz hg38_All_20180418.vcf.gz  | bgzip -c > Gnomad_V2_Exomes_hg38_liftover-AND-hg38_dbSNP151_20180418.vcf.gz
tabix -p vcf Gnomad_V2_Exomes_hg38_liftover-AND-hg38_dbSNP151_20180418.vcf.gz
```
*Warning: This could take a really long time (~2days).*


**4. Split by Chromosomes**

This is necessary to perform chromosome-level intersections.
```bash
for i in `seq 1 22`; do
    echo $i
    tabix Gnomad_V2_Exomes_hg38_liftover-AND-hg38_dbSNP151_20180418.vcf.gz $i > Gnomad_V2_Exomes_hg38_liftover-AND-hg38_dbSNP151_20180418/$i.vcf
done
tabix Gnomad_V2_Exomes_hg38_liftover-AND-hg38_dbSNP151_20180418.vcf.gz X > Gnomad_V2_Exomes_hg38_liftover-AND-hg38_dbSNP151_20180418/X.vcf
```


**5. Intersection with CRISPR/Cas9 target sequences.**

See below


```bash
for i in `seq 1 22`; do 
    echo $i; 
    bedtools intersect \
    -a ./frontend/data/bed/CDSpams-byChrom/$i.bed \
    -b /home/lingj/zhanglab/shared/genomes/vcf/Gnomad_V2_Exomes_hg38_liftover-AND-hg38_dbSNP151_20180418/$i.vcf \
    -loj > ./frontend/data/bed/CDSpams-byChrom/$i.intersect.bed; 
done
```

```bash
i="X"
echo $i; 
bedtools intersect \
-a ./frontend/data/bed/CDSpams-byChrom/$i.bed \
-b /home/lingj/zhanglab/shared/genomes/vcf/Gnomad_V2_Exomes_hg38_liftover-AND-hg38_dbSNP151_20180418/$i.vcf \
-loj > ./frontend/data/bed/CDSpams-byChrom/$i.intersect.bed; 
```

In [1]:
%load_ext watermark
%watermark -n -u -v -iv -w

Last updated: Sun Aug 28 2022

Python implementation: CPython
Python version       : 3.7.9
IPython version      : 7.22.0

Watermark: 2.3.1



In [3]:
%%bash
vcftools --version
bedtools --version

VCFtools (0.1.16)
bedtools v2.30.0
