# Sequence Alignment 

+ Sequence alignment is a method of arranging DNA, RNA or Amino Acids or proteins to identify regions of sumilarity
+ The similarity being identified, may be the result of functional, structural, or evolutionary relationships
+ It is useful to identify similarity and homology
+ Homology: decent from a common ancestor or source

# Terms 

+ Matches
+ Mismatches
+ Gap

# Alignment Types 

+ Global alignment: finds the best concordance between all characters in 2 sequences
    + Mostly from end to end
    + By Needle
    
+ Local Alignment: finds just the subsequences that align best
    + In this methods, we condider subsequences within each of the 2 sequences and try to match them to obtain the best alignmnet
    + By Water
    
# Whern to use local alignment

+ 2 sequences have a small matched region
+ 2 sequences are different lengths
+ Overlapping sequecnes
+ One sequence is a subsequence of the other
+ Blast
+ Emboss
    

In [7]:
from Bio.Seq import Seq

In [8]:
from Bio import pairwise2
from Bio.pairwise2 import format_alignment

In [9]:
seq1 = Seq('ACTCGT')
seq2 = Seq('ATTCG')

In [10]:
# Global Alignment
alignments = pairwise2.align.globalxx(seq1, seq2)

In [11]:
alignments

[Alignment(seqA='ACT-CGT', seqB='A-TTCG-', score=4.0, start=0, end=7),
 Alignment(seqA='AC-TCGT', seqB='A-TTCG-', score=4.0, start=0, end=7),
 Alignment(seqA='ACTCGT', seqB='ATTCG-', score=4.0, start=0, end=6)]

In [12]:
# To display the alignment

print(format_alignment(*alignments[0]))

ACT-CGT
| | || 
A-TTCG-
  Score=4



In [14]:
# Global Alignment
loc_alignments = pairwise2.align.localxx(seq1, seq2)

In [16]:
for a in loc_alignments:
    print(format_alignment(*a))

1 ACT-CG
  | | ||
1 A-TTCG
  Score=4

1 AC-TCG
  |  |||
1 A-TTCG
  Score=4

1 ACTCG
  |.|||
1 ATTCG
  Score=4



In [21]:
# get the alignment by only the score

alignment2 = pairwise2.align.globalxx(seq1,seq2, one_alignment_only = True, score_only = True)

In [22]:
alignment2

4.0

# Check for similarity of percentage of similarity using Alignment

+ fraction of nucleotides that is the same/total number of nucleotides * 100 %

In [23]:
seq1

Seq('ACTCGT')

In [25]:
seq2

Seq('ATTCG')

In [24]:
alignment2/len(seq1) * 100

66.66666666666666

In [29]:
# Get the alignment by only the score

loc_alignment2 = pairwise2.align.localxx(seq1, seq2,  one_alignment_only = True, score_only = True)

In [30]:
loc_alignment2/len(seq1) * 100

66.66666666666666

### Find out all the possible global alignments with the maximum similarity  score

+ Matching characters : 2 points
+ Each mismatching character : -1 point
+ 0.5 points are deducted when opening a gap,
+ 0.1 points are deducted with extending it

In [33]:
# Globsl alignment with max sim
glb_alignment = pairwise2.align.globalms(seq1,seq2, 2, -1, -0.5, -0.1)

In [34]:
# View all
for a in glb_alignment:
    print(format_alignment(*a))

ACT-CGT
| | || 
A-TTCG-
  Score=6.5

AC-TCGT
|  ||| 
A-TTCG-
  Score=6.5

ACTCGT
|.||| 
ATTCG-
  Score=6.5

