# Analyzing nanopore sequencing data for AMBIC

## Set up
Necessary softwares should be installed already in here.  
They are described in README.md as well as   
I made a snakemake pipeline for ease of use - I will go through each step using commands as well as snakemake.

In [57]:
# first set working directory
workdir=~/Data
codedir=$PWD
reference=/mnt/ref/4_0cdhfr_vrc01wtg1m3_dgv.fa
reference=/mnt/ref/Cricetulus_griseus_picr.CriGri-PICR.dna.toplevel.nihIgG.fa
threads=12
sample=CHOK1IgG

## Alignment
There are two main aligners we use for nanopore analysis : minimap2 and ngmlr
### minimap2
First make index

In [11]:
mmi=${reference%.fa}.map_ont.mmi
if [ ! -e $mmi ]; then
    minimap2 -x map-ont -d $mmi $reference
fi

In [12]:
ls -lh $mmi

-rw-r--r-- 1 jupyter-isac jupyter-isac 5.6G Oct 23 11:59 /mnt/ref/Cricetulus_griseus_picr.CriGri-PICR.dna.toplevel.nihIgG.map_ont.mmi


Then align

In [32]:
[ -e bam ]||mkdir bam
fq=$workdir/reads/$sample.fastq.gz
outpre=$workdir/bam/$sample.minimap2
output=$outpre.sorted.bam
minimap2 --MD -L -t $threads -a $mmi ${fq} |\
    samtools view -q 20 -b - |\
    samtools sort -T $outpre.sorting -o ${output} &&\
    samtools index ${output}

snakemake version :

In [43]:
rm $workdir/bam/*
output=$workdir/bam/$sample.minimap2.sorted.bam
snakemake $output

rm: cannot remove '/home/jupyter-isac/Data/bam/*': No such file or directory
[33mBuilding DAG of jobs...[0m
[33mUsing shell: /bin/bash[0m
[33mProvided cores: 1[0m
[33mRules claiming more threads will be scaled down.[0m
[33mJob counts:
	count	jobs
	1	minimap2_align
	1[0m
[32m[0m
[32m[Wed Oct 23 13:34:50 2019][0m
[32mrule minimap2_align:
    input: /mnt/ref/Cricetulus_griseus_picr.CriGri-PICR.dna.toplevel.nihIgG.map_ont.mmi, /home/jupyter-isac/Data/reads/CHOK1IgG.fastq.gz
    output: /home/jupyter-isac/Data/bam/CHOK1IgG.minimap2.sorted.bam
    log: /home/jupyter-isac/Data/bam/CHOK1IgG.minimap2.align.log
    jobid: 0
    wildcards: dir=/home/jupyter-isac/Data, sample=CHOK1IgG[0m
[32m[0m
[32m[Wed Oct 23 13:35:15 2019][0m
[32mFinished job 0.[0m
[32m1 of 1 steps (100%) done[0m
[33mComplete log: /home/jupyter-isac/ambic-epigenome-dev/.snakemake/log/2019-10-23T133450.598278.snakemake.log[0m


In [45]:
ls -lh $workdir/bam

total 18M
-rw-r--r-- 1 jupyter-isac jupyter-isac  639 Oct 23 13:35 CHOK1IgG.minimap2.align.log
-rw-r--r-- 1 jupyter-isac jupyter-isac  17M Oct 23 13:35 CHOK1IgG.minimap2.sorted.bam
-rw-r--r-- 1 jupyter-isac jupyter-isac 680K Oct 23 13:35 CHOK1IgG.minimap2.sorted.bam.bai


### ngmlr

## Metylation calling
We use nanopolish for methylation calling 
### nanopolish index
First step is to index the reads

In [55]:
rdir=$workdir/reads/$sample
nanopolish index -d $rdir $rdir.fastq.gz

[readdb] indexing /home/jupyter-isac/Data/reads/CHOK1IgG
[readdb] num reads: 1567, num reads with path to fast5: 1567


In [52]:
output=$rdir.fastq.gz.index.readdb
rm $output
snakemake $output

[33mBuilding DAG of jobs...[0m
[33mUsing shell: /bin/bash[0m
[33mProvided cores: 1[0m
[33mRules claiming more threads will be scaled down.[0m
[33mJob counts:
	count	jobs
	1	nanopolish_index
	1[0m
[32m[0m
[32m[Wed Oct 23 13:56:06 2019][0m
[32mrule nanopolish_index:
    input: /home/jupyter-isac/Data/reads/CHOK1IgG.fastq.gz
    output: /home/jupyter-isac/Data/reads/CHOK1IgG.fastq.gz.index.readdb
    jobid: 0
    wildcards: pre=/home/jupyter-isac/Data/reads/CHOK1IgG[0m
[32m[0m
[32m[Wed Oct 23 13:56:07 2019][0m
[32mFinished job 0.[0m
[32m1 of 1 steps (100%) done[0m
[33mComplete log: /home/jupyter-isac/ambic-epigenome-dev/.snakemake/log/2019-10-23T135606.367563.snakemake.log[0m


In [54]:
ls -lh $workdir/reads

total 13M
drwxr-xr-x 2 jupyter-isac root         4.0K Oct 22 18:28 CHOK1IgG
-rw-r--r-- 1 jupyter-isac root         9.6M Oct 22 18:28 CHOK1IgG.fastq.gz
-rw-r--r-- 1 jupyter-isac jupyter-isac 2.8M Oct 23 13:56 CHOK1IgG.fastq.gz.index
-rw-r--r-- 1 jupyter-isac jupyter-isac  91K Oct 23 13:56 CHOK1IgG.fastq.gz.index.fai
-rw-r--r-- 1 jupyter-isac jupyter-isac 2.5K Oct 23 13:56 CHOK1IgG.fastq.gz.index.gzi
-rw-r--r-- 1 jupyter-isac jupyter-isac 140K Oct 23 13:56 CHOK1IgG.fastq.gz.index.readdb
-rw-r--r-- 1 jupyter-isac jupyter-isac  118 Oct 23 13:56 CHOK1IgG.index.log


### call methylation


In [58]:
fastq=$workdir/reads/$sample.fastq.gz
bam=$workdir/bam/$sample.minimap2.sorted.bam
outdir=$workdir/mcall
[ -e $outdir ]||mkdir $outdir
output=$outdir/$sample.cpg.meth.tsv.gz
nanopolish call-methylation -v -t ${threads} -q cpg \
    -g $reference -r $fastq -b $bam |\
    gzip > $output

[post-run summary] total reads: 6388, unparseable: 0, qc fail: 1, could not calibrate: 3, no alignment: 867, bad fast5: 0


In [63]:
rm $workdir/mcall/*
snakemake -j $threads -p $output

rm: cannot remove '/home/jupyter-isac/Data/mcall/*': No such file or directory
[33mBuilding DAG of jobs...[0m
[33mUsing shell: /bin/bash[0m
[33mProvided cores: 12[0m
[33mRules claiming more threads will be scaled down.[0m
[33mJob counts:
	count	jobs
	1	call_cpg
	1[0m
[32m[0m
[32m[Wed Oct 23 14:21:46 2019][0m
[32mrule call_cpg:
    input: /home/jupyter-isac/Data/reads/CHOK1IgG.fastq.gz, /home/jupyter-isac/Data/bam/CHOK1IgG.minimap2.sorted.bam, /home/jupyter-isac/Data/reads/CHOK1IgG.fastq.gz.index.readdb
    output: /home/jupyter-isac/Data/mcall/CHOK1IgG.cpg.meth.tsv.gz
    jobid: 0
    wildcards: dir=/home/jupyter-isac/Data, sample=CHOK1IgG
    threads: 12[0m
[32m[0m
[33mnanopolish call-methylation -v -t 12 -q cpg -g /mnt/ref/Cricetulus_griseus_picr.CriGri-PICR.dna.toplevel.nihIgG.fa -r /home/jupyter-isac/Data/reads/CHOK1IgG.fastq.gz -b /home/jupyter-isac/Data/bam/CHOK1IgG.minimap2.sorted.bam | gzip > /home/jupyter-isac/Data/mcall/CHOK1IgG.cpg.meth.tsv.gz[0m
[post-ru

In [64]:
ls -lh $workdir/mcall

total 1.3M
-rw-r--r-- 1 jupyter-isac jupyter-isac 1.3M Oct 23 14:23 CHOK1IgG.cpg.meth.tsv.gz
