<h1 id="toctitle">Automated testing</h1>
<ul id="toc"/>

Two tools to look at:

- `assert` to get started with testing concepts 

__Some examples written deliberately sub-optimally!__

## Testing with `assert`

Write a function which takes a DNA sequence, a kmer length, and a threshold, and returns a list of all the kmers that occur at least the threshold number of times in the sequence. First attempt:

In [None]:
def find_common_kmers(dna, k, threshold):
    result = []
    for start in range(len(dna)):
        kmer = dna[start:start+k]
        if dna.count(kmer) >= threshold:
            result.append(kmer)
    return result

Now a very simple test: can our function correctly figure out that 'atgaatgc' contains 'atg' twice?

In [None]:
find_common_kmers('atgaatgc', 3, 2)

Is this correct? No way to tell from the description; it depends on what we want to use the output for. 

**Testing forces us to think carefully about how we want our code to behave.**

### Write the test before you fix the bug

Let's say that we only want unique kmers in the output. We can test this:

In [None]:
assert find_common_kmers('atgaatgc', 3, 2) == ['atg']

As expected, the test fails. Let's edit the code to fix it:

In [None]:
def find_common_kmers(dna, k, threshold):
    result = []
    for start in range(len(dna)):
        kmer = dna[start:start+k]
        if dna.count(kmer) >= threshold and kmer not in result:
            result.append(kmer)
    return result

assert find_common_kmers('atgaatgc', 3, 2) == ['atg']

Now it runs without error. Why bother writing the test if we're going to fix it anyway? Because bugs have a habit of re-emerging when you start editing the code. 

A more complicated example: what do we expect the output to be from 

```python
find_common_kmers('atgaatgcaaatga', 3, 3)
```

? 'atg' is in the sequence three times, and no other 3mer occurs more than twice, so we should see `['atg']`. An assertion expresses this idea:

In [None]:
assert find_common_kmers('atgaatgcaaatga', 3, 3) == ['atg']

Again it fails, but why? To figure out, we probably have to look at the return value:

In [None]:
find_common_kmers('atgaatgcaaatga', 3, 3)

There is a bug in the kmer generating code. We have forgotten to make sure that we only get complete kmers. We can fix it by tweaking the call to `range()`:

In [None]:
def find_common_kmers(dna, k, threshold):
    result = []
    for start in range(len(dna) +1 -k):
        kmer = dna[start:start+k]
        if dna.count(kmer) >= threshold and kmer not in result:
            result.append(kmer)
    return result



Now both our assertions run without errors:

In [None]:
assert find_common_kmers('atgaatgc', 3, 2) == ['atg']
assert find_common_kmers('atgaatgcaaatga', 3, 3) == ['atg']

### How many tests to write?

As soon as we start thinking about testing, it's obvious that there are an infinite number of possible tests. A good way to write tests efficiently is to test extreme inputs. If it works for k=1 and k=10 then it probably works for k=2,3,4,etc.

In [None]:
assert find_common_kmers('aattggcc', 1, 2) == ['a', 't', 'g', 'c']
assert find_common_kmers('tagctagtcg', 10, 1) == ['tagctagtcg']

Another good idea is to test the function on unrealistic inputs. Example: for kmer length of zero we might expect an empty list:

In [None]:
assert find_common_kmers('tagctagtcg', 0, 2) == []

but in fact we get a list with a single element which is an empty string:

In [None]:
find_common_kmers('tagctagtcg', 0, 2)

Here's why:

In [None]:
'tagctagtcg'[0:0]

In [None]:
'tagctagtcg'.count('')

To fix this let's put in a special case for k<1:

In [None]:
def find_common_kmers(dna, k, threshold):
    if k < 1:
        return []
    result = []
    for start in range(len(dna) + 1 - k):
        kmer = dna[start:start+k]
        if dna.count(kmer) >= threshold and kmer not in result:
            result.append(kmer)
    return result

Plus a few more assertions to make a test suite:

In [None]:
assert find_common_kmers('atgaatgcaaatga', 3, 3) == ['atg']
assert find_common_kmers('atgaatgc', 3, 2) == ['atg']
assert find_common_kmers('aattggcc', 1, 2) == ['a', 't', 'g', 'c']
assert find_common_kmers('tagctagtcg', 10, 1) == ['tagctagtcg'] 
assert find_common_kmers('ctagctgctcgtgactgtcagtgtacg', 2, 4) ==  ['ct', 'tg', 'gt']
assert find_common_kmers('cccaaaacccaaaacccaaaacccaaaa', 4, 4) ==  ['ccca', 'ccaa', 'caaa', 'aaaa']
assert find_common_kmers('tagctagtcg', 0, 2) == []
assert find_common_kmers('tagctagtcg', -3, 2) == []

### Refactoring and regressions

Let's do a few quick benchmarks:

In [None]:
import random
def random_dna(length):
    return "".join([random.choice(['A','T','G','C']) for _ in range(length)])

r = random_dna(2000)
%timeit find_common_kmers(r, 8, 1000)

r = random_dna(20000)
%timeit find_common_kmers(r, 8, 1000)

Increasing the size of the DNA sequence x 10 increases the time x 100. This function doesn't scale well (repeated calls to `count()`). We would like to rewrite it to be faster **without** changing its behaviour. 

We call this **refactoring**.

Here's an attempt at a better version which uses a dict to keep a running total of kmers:

In [None]:
def find_common_kmers(dna, k, threshold):
   
    kmer2count = {}
    for start in range(len(dna) + 1 - k):
        kmer = dna[start:start+k]
        old_count = kmer2count.get(kmer, 0)
        kmer2count[kmer] = old_count + 1
    
    result = []
    for kmer, count in kmer2count.items():
        if count >= threshold:
            result.append(kmer)
    return result


First let's run benchmarks again to see if it's actually faster:

In [None]:
r = random_dna(2000)
%timeit find_common_kmers(r, 8, 1000)

r = random_dna(20000)
%timeit find_common_kmers(r, 8, 1000)

Much better - but how do we know that we haven't changed the behaviour? Just re-run the tests:

In [None]:
assert find_common_kmers('atgaatgcaaatga', 3, 3) == ['atg']
assert find_common_kmers('atgaatgc', 3, 2) == ['atg']
assert find_common_kmers('aattggcc', 1, 2) == ['a', 't', 'g', 'c']
assert find_common_kmers('tagctagtcg', 10, 1) == ['tagctagtcg'] 
assert find_common_kmers('ctagctgctcgtgactgtcagtgtacg', 2, 4) ==  ['ct', 'tg', 'gt']
assert find_common_kmers('cccaaaacccaaaacccaaaacccaaaa', 4, 4) ==  ['ccca', 'ccaa', 'caaa', 'aaaa']
assert find_common_kmers('tagctagtcg', 0, 2) == []
assert find_common_kmers('tagctagtcg', -3, 2) == []

Something interesting - now the order is different:

In [None]:
find_common_kmers('aattggcc', 1, 2)

The new version of the code doesn't preserve the input order. Do we care about this? Probably not, so let's rewrite the tests:

In [None]:
assert set(find_common_kmers('aattggcc', 1, 2)) == set(['a', 't', 'g', 'c'])

In [None]:
assert set(find_common_kmers('atgaatgcaaatga', 3, 3)) == set(['atg'])
assert set(find_common_kmers('atgaatgc', 3, 2)) == set(['atg'])
assert set(find_common_kmers('aattggcc', 1, 2)) == set(['a', 't', 'g', 'c'])
assert set(find_common_kmers('tagctagtcg', 10, 1)) == set(['tagctagtcg'])
assert set(find_common_kmers('ctagctgctcgtgactgtcagtgtacg', 2, 4)) ==  set(['ct', 'tg', 'gt'])
assert set(find_common_kmers('cccaaaacccaaaacccaaaacccaaaa', 4, 4)) ==  set(['ccca', 'ccaa', 'caaa', 'aaaa'])
assert set(find_common_kmers('tagctagtcg', 0, 2)) == set([])
assert set(find_common_kmers('tagctagtcg', -3, 2)) == set([])

This time we get down to the 7th test before something fails:

In [None]:
find_common_kmers('tagctagtcg', 0, 2)

Same problem, same solution:

In [None]:
def find_common_kmers(dna, k, threshold):
    if k < 1:
        return []
   
    kmer2count = {}
    for start in range(len(dna) + 1 - k):
        kmer = dna[start:start+k]
        old_count = kmer2count.get(kmer, 0)
        kmer2count[kmer] = old_count + 1
    
    result = []
    for kmer, count in kmer2count.items():
        if count >= threshold:
            result.append(kmer)
    return result

assert set(find_common_kmers('atgaatgcaaatga', 3, 3)) == set(['atg'])
assert set(find_common_kmers('atgaatgc', 3, 2)) == set(['atg'])
assert set(find_common_kmers('aattggcc', 1, 2)) == set(['a', 't', 'g', 'c'])
assert set(find_common_kmers('tagctagtcg', 10, 1)) == set(['tagctagtcg'])
assert set(find_common_kmers('ctagctgctcgtgactgtcagtgtacg', 2, 4)) ==  set(['ct', 'tg', 'gt'])
assert set(find_common_kmers('cccaaaacccaaaacccaaaacccaaaa', 4, 4)) ==  set(['ccca', 'ccaa', 'caaa', 'aaaa'])
assert set(find_common_kmers('tagctagtcg', 0, 2)) == set([])
assert set(find_common_kmers('tagctagtcg', -3, 2)) == set([])

We have re-introduced an old bug by rewriting the code to fix a different problem. This is called a **regression**. 

We have caught the bug before running it on any real life data. 

### Setting up and tearing down

Our `find_common_kmers()` function has no **side effects**, which means that it's easy to test. 

Look at a function which does have side effects. We want to take a collection of reads and filter out any that have too many Ns:

In [None]:
def filter_reads(reads, threshold): 
    # iterate over a copy of the reads, so we don't alter the list as we're iterating over it
    for read in list(reads): 
        if read.count('N') >= threshold: 
            reads.remove(read)

Make some reads with 0/1/2 Ns:

In [None]:
reads = ['ATCGTAC', 'ACTGNTTACGT', 'ACTGNNTACTG']

Now we'll test that it works with a threshold of one, which should cause the last two reads to be removed:

In [None]:
filter_reads(reads, 1)
assert reads == ['ATCGTAC']

Everything looks OK. Next, we'll test a threshold of two which should remove only the last read:

In [None]:
filter_reads(reads, 2)
assert reads == ['ATCGTAC', 'ACTGNTTACGT']

Of course, by the time we get to the second test, the last two reads have already been removed. We need to recreate the reads list each time:

In [None]:
reads = ['ATCGTAC', 'ACTGNTTACGT', 'ACTGNNTACTG']
filter_reads(reads, 1)
assert reads == ['ATCGTAC']

reads = ['ATCGTAC', 'ACTGNTTACGT', 'ACTGNNTACTG']
filter_reads(reads, 2)
assert reads == ['ATCGTAC', 'ACTGNTTACGT']

reads = ['ATCGTAC', 'ACTGNTTACGT', 'ACTGNNTACTG']
filter_reads(reads, 3)
assert reads == ['ATCGTAC', 'ACTGNTTACGT', 'ACTGNNTACTG']

Imagine a much more complicated set-up scenario - most of the code will be set up. We can turn the set up stuff into a function:

In [None]:
reads = []
def create_reads():
    global reads
    reads = ['ATCGTAC', 'ACTGNTTACGT', 'ACTGNNTACTG']

create_reads()  
filter_reads(reads, 1)
assert reads == ['ATCGTAC']

create_reads()
filter_reads(reads, 2)
assert reads == ['ATCGTAC', 'ACTGNTTACGT']

create_reads()    
filter_reads(reads, 3)
assert reads == ['ATCGTAC', 'ACTGNTTACGT', 'ACTGNNTACTG']

In [None]:
# ignore this cell, it's for loading custom js code
from IPython.core.display import Javascript
Javascript(filename="custom.js")

In [None]:
# ignore this cell, it's for loading custom css code
from IPython.core.display import HTML
HTML(filename="custom.css")