# Evolution as a Sequence of Mistakes

A mutation is simply a mistake that occurs during the creation or copying of a nucleic acid, in particular DNA. Because nucleic acids are vital to cellular functions, mutations tend to cause a ripple effect throughout the cell. Although mutations are technically mistakes, a very rare mutation may equip the cell with a beneficial attribute. In fact, the macro effects of evolution are attributable by the accumulated result of beneficial microscopic mutations over many generations.

The simplest and most common type of nucleic acid mutation is a point mutation, which replaces one base with another at a single nucleotide. In the case of DNA, a point mutation must change the complementary base accordingly; see Figure 1.

Two DNA strands taken from different organism or species genomes are homologous if they share a recent ancestor; thus, counting the number of bases at which homologous strands differ provides us with the minimum number of point mutations that could have occurred on the evolutionary path between the two strands.

We are interested in minimizing the number of (point) mutations separating two species because of the biological principle of parsimony, which demands that evolutionary histories should be as simply explained as possible.

[Link to Rosalind](https://rosalind.info/problems/hamm/)

# Problem

Given two strings $s$ and $t$ of equal length, the Hamming distance between $s$ and $t$, denoted $d_{H}(s,t)$, is the number of corresponding symbols that differ in $s$ and $t$. See Figure 2.

<span style="color:rgba(70,165,70,255); font-weight:bold">Given</span>: Two DNA strings $s$ and $t$ of equal length (not exceeding 1 kbp).

<span style="color:rgba(70,165,70,255); font-weight:bold">Return</span>: The Hamming distance $d_{H}(s,t)$

# Read Example Input and Output Files

In [7]:
%run ../../functions/read_files.ipynb

In [8]:
input = read_text('sample_input.txt')
print(input)

output = read_text('sample_output.txt')
print(output)

GAGCCTACTAACGGGAT
CATCGTAATGACGGCCT
7


# Problem Solving Logic

In [9]:
def compute_hamming_distance(input):
    
    str1, str2 = input.split("\n")[0], input.split("\n")[1]

    if len(str1) != len(str2):
        raise ValueError("Strings must be of equal length.")
    
    return sum(char1 != char2 for char1, char2 in zip(str1, str2))


print(compute_hamming_distance(input))


7


In [10]:
compute_hamming_distance(input) == int(output)

True

# Run Real Input

In [11]:
real_input = read_text('rosalind_hamm.txt')

print(compute_hamming_distance(real_input))


522
