## BIOS 470/570 Lecture 14

## Last time we covered:
* ### Modeling single cell RNA seq data and eliminating unwanted variation with scvi
* ### pseudotime analysis

## Today we will cover:
* ### Sequence alignment algorithms

We will need the biopython package. If you don't have this already install with conda install biopython -c conda-forge. I recommend *not* using your scvi environment

In [1]:
from Bio import Align
import numpy as np
import pandas as pd

### The align module from biopython has a pairwise aligner that implements the Smith-Waterman algoirthm and extensions. 

### First define an aligner object, this will actually take the sequence and produce the alignments

In [17]:
aligner = Align.PairwiseAligner()

### The parameters to use are stored in the aligner:

In [18]:
print(aligner)

Pairwise sequence aligner with parameters
  match_score: 1.000000
  mismatch_score: 0.000000
  target_internal_open_gap_score: 0.000000
  target_internal_extend_gap_score: 0.000000
  target_left_open_gap_score: 0.000000
  target_left_extend_gap_score: 0.000000
  target_right_open_gap_score: 0.000000
  target_right_extend_gap_score: 0.000000
  query_internal_open_gap_score: 0.000000
  query_internal_extend_gap_score: 0.000000
  query_left_open_gap_score: 0.000000
  query_left_extend_gap_score: 0.000000
  query_right_open_gap_score: 0.000000
  query_right_extend_gap_score: 0.000000
  mode: global



### The default here has a match score of 1 and other scores are zeros. 

### Let's see an alignment based on this:

In [15]:
alignments = aligner.align("atgaat","atgcat")

### There can be multiple alignments that work, let's see some properties of these:

In [16]:
for alignment in alignments:
    print(alignment)
    print(alignment.path)
    print(alignment.score)

atg-aat
|||-|-|
atgca-t

((0, 0), (3, 3), (3, 4), (4, 5), (5, 5), (6, 6))
5.0
atga-at
|||--||
atg-cat

((0, 0), (3, 3), (4, 3), (4, 4), (6, 6))
5.0
atg-aat
|||--||
atgc-at

((0, 0), (3, 3), (3, 4), (4, 4), (6, 6))
5.0
atgaat
|||.||
atgcat

((0, 0), (6, 6))
5.0


### What if we give a penalty for gaps in the sequence?

In [36]:
aligner.gap_score = -5
print(aligner)

Pairwise sequence aligner with parameters
  match_score: 1.000000
  mismatch_score: 0.000000
  target_internal_open_gap_score: -5.000000
  target_internal_extend_gap_score: -5.000000
  target_left_open_gap_score: -5.000000
  target_left_extend_gap_score: -5.000000
  target_right_open_gap_score: -5.000000
  target_right_extend_gap_score: -5.000000
  query_internal_open_gap_score: -5.000000
  query_internal_extend_gap_score: -5.000000
  query_left_open_gap_score: -5.000000
  query_left_extend_gap_score: -5.000000
  query_right_open_gap_score: -5.000000
  query_right_extend_gap_score: -5.000000
  mode: global



In [37]:
alignments = aligner.align("atgaat","atgcat")
for alignment in alignments:
    print(alignment)

atgaat
|||.||
atgcat



### How about if we lower the gap score and raise the mismatch score?

In [38]:
aligner.gap_score = -1
aligner.mismatch_score = -5
print(aligner)

Pairwise sequence aligner with parameters
  match_score: 1.000000
  mismatch_score: -5.000000
  target_internal_open_gap_score: -1.000000
  target_internal_extend_gap_score: -1.000000
  target_left_open_gap_score: -1.000000
  target_left_extend_gap_score: -1.000000
  target_right_open_gap_score: -1.000000
  target_right_extend_gap_score: -1.000000
  query_internal_open_gap_score: -1.000000
  query_internal_extend_gap_score: -1.000000
  query_left_open_gap_score: -1.000000
  query_left_extend_gap_score: -1.000000
  query_right_open_gap_score: -1.000000
  query_right_extend_gap_score: -1.000000
  mode: global



In [39]:
alignments = aligner.align("atgaat","atgcat")
for alignment in alignments:
    print(alignment)

atg-aat
|||-|-|
atgca-t

atga-at
|||--||
atg-cat

atg-aat
|||--||
atgc-at



### We can set the parameters as we call the aligner:

In [40]:
aligner = Align.PairwiseAligner(mismatch_score = 0, gap_score = 0, extend_gap_score = -1)
print(aligner)

Pairwise sequence aligner with parameters
  match_score: 1.000000
  mismatch_score: 0.000000
  target_internal_open_gap_score: 0.000000
  target_internal_extend_gap_score: -1.000000
  target_left_open_gap_score: 0.000000
  target_left_extend_gap_score: -1.000000
  target_right_open_gap_score: 0.000000
  target_right_extend_gap_score: -1.000000
  query_internal_open_gap_score: 0.000000
  query_internal_extend_gap_score: -1.000000
  query_left_open_gap_score: 0.000000
  query_left_extend_gap_score: -1.000000
  query_right_open_gap_score: 0.000000
  query_right_extend_gap_score: -1.000000
  mode: global



In [41]:
alignments = aligner.align("atgaat","atgcat")
for alignment in alignments:
    print(alignment)
    print(alignment.score)

atgaat
|||.||
atgcat

5.0
atg-aat
|||--||
atgc-at

5.0
atga-at
|||--||
atg-cat

5.0
atg-aat
|||-|-|
atgca-t

5.0
