Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent alignment to alignment score #1

Closed
ksahlin opened this issue Jan 29, 2016 · 4 comments
Closed

Inconsistent alignment to alignment score #1

ksahlin opened this issue Jan 29, 2016 · 4 comments
Labels

Comments

@ksahlin
Copy link
Contributor

ksahlin commented Jan 29, 2016

Hi,

I'm providing an example with an inconsistent alignment score and the actual alignment below (Original post from mengyao/Complete-Striped-Smith-Waterman-Library#29):

client-104-39-79-38:workspace kxs624$ python
Python 2.7.9 (default, Dec  1 2015, 18:18:28) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import ssw
>>> ref_seq = "CCC" + "AGCT"*10
>>> query_seq = "AGGT"*10
>>> aligner = ssw.Aligner(ref_seq, gap_open=1, gap_extend=1)
>>> alignment = aligner.align(query_seq)
>>> print(alignment.alignment_report)
Score = 40, Matches = 0, Mismatches = 38, Insertions = 0, Deletions = 0

ref   4   CCCAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGC
          **************************************
query 1   AGGTAGGTAGGTAGGTAGGTAGGTAGGTAGGTAGGTAG

For clarity, below is what I expect the result to look like:

CCCAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCT
   ||*|||*|||*|||*|||*|||*|||*|||*|||*|||*|
---AGGTAGGTAGGTAGGTAGGTAGGTAGGTAGGTAGGTAGGT

I see that you have removed the match and mismatch options. What are these penalties set to, as the score is now 40?

@vishnubob
Copy link
Owner

Thanks for the bug report. I just pushed a fixed for this, and added your example to the unit tests. I moved the match and mismatch scores to the Matrix class. If you wish to customize your match and mismatch values, you can create your own Matrix instance, and pass it as part of the Aligner constructor. In the future, I intend to add more support for ambiguity bases and protein alignments, which is why I broke the Matrix into its own class.

@vishnubob vishnubob added the bug label Jan 29, 2016
@ksahlin
Copy link
Contributor Author

ksahlin commented Jan 29, 2016

Thank you, this solves a big part of my problem. Here is the new output

>>> print(alignment.alignment_report())
Score = 40, Matches = 29, Mismatches = 9, Insertions = 0, Deletions = 0

ref   4   AGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAG
          ||*|||*|||*|||*|||*|||*|||*|||*|||*|||
query 1   AGGTAGGTAGGTAGGTAGGTAGGTAGGTAGGTAGGTAG

It seems like the last two characters are cut from this alignment, probably because they contain one mismatch and one match so the score is 0 for these two last characters --- and one alignment is returned. I thought I can solve it by letting e.g. mismatch = -1 and match =1. I issued:

>>> score_matrix = ssw.ScoreMatrix(match=2,mismatch=-1)
>>> aligner = ssw.Aligner(ref_seq, matrix=score_matrix, gap_open=1, gap_extend=1)
>>> alignment = aligner.align(query_seq)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kxs624/.pyenv/versions/2.7.9/lib/python2.7/site-packages/ssw/sswobj.py", line 95, in align
    res = self._align(query, reference, flags, filter_score, filter_distance, mask_length)
  File "/Users/kxs624/.pyenv/versions/2.7.9/lib/python2.7/site-packages/ssw/sswobj.py", line 104, in _align
    _query = self.matrix.convert_sequence_to_ints(query)
  File "/Users/kxs624/.pyenv/versions/2.7.9/lib/python2.7/site-packages/ssw/sswobj.py", line 62, in convert_sequence_to_ints
    return _seq_type(*seq_generator)
  File "/Users/kxs624/.pyenv/versions/2.7.9/lib/python2.7/site-packages/ssw/sswobj.py", line 61, in <genexpr>
    seq_generator = (self.symbol_map[symbol.upper()] for symbol in seq)
KeyError: 'A'

@vishnubob
Copy link
Owner

The problem is that you need to use the DNA_ScoreMatrix, since ScoreMatrix has no alphabet defined.

reference = "CCC" + "AGCT" * 10
query = "AGGT" * 10
score_matrix = ssw.DNA_ScoreMatrix(match=2, mismatch=-1)
aligner = ssw.Aligner(matrix=score_matrix)
alignment = aligner.align(query, reference)
print(alignment.alignment_report())
Score = 50, Matches = 30, Mismatches = 10, Insertions = 0, Deletions = 0

ref   4   AGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCT
          ||*|||*|||*|||*|||*|||*|||*|||*|||*|||*|
query 1   AGGTAGGTAGGTAGGTAGGTAGGTAGGTAGGTAGGTAGGT

@ksahlin
Copy link
Contributor Author

ksahlin commented Jan 29, 2016

Thank you, that solves it!

@ksahlin ksahlin closed this as completed Jan 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants