<h1 id="toctitle">Automated testing</h1>
<ul id="toc"/>

Two tools to look at:

- `assert` to get started with testing concepts 
- `nose` for more convenience

__Some examples written deliberately sub-optimally!__

## Testing with `assert`

Write a function which takes a DNA sequence, a kmer length, and a threshold, and returns a list of all the kmers that occur at least the threshold number of times in the sequence. First attempt:

In [4]:
def find_common_kmers(dna, k, threshold):
    result = []
    for start in range(len(dna)):
        kmer = dna[start:start+k]
        if dna.count(kmer) >= threshold:
            result.append(kmer)
    return result

Now a very simple test: can our function correctly figure out that 'atgaatgc' contains 'atg' twice?

In [2]:
find_common_kmers('atgaatgc', 3, 2)

['atg', 'atg']

Is this correct? No way to tell from the description; it depends on what we want to use the output for. 

**Testing forces us to think carefully about how we want our code to behave.**

### Write the test before you fix the bug

Let's say that we only want unique kmers in the output. We can test this:

In [5]:
assert find_common_kmers('atgaatgc', 3, 2) == ['atg']

AssertionError: 

As expected, the test fails. Let's edit the code to fix it:

In [7]:
def find_common_kmers(dna, k, threshold):
    result = []
    for start in range(len(dna)):
        kmer = dna[start:start+k]
        if dna.count(kmer) >= threshold and kmer not in result:
            result.append(kmer)
    return result

assert find_common_kmers('atgaatgc', 3, 2) == ['atg']

Now it runs without error. Why bother writing the test if we're going to fix it anyway? Because bugs have a habit of re-emerging when you start editing the code. 

A more complicated example: what do we expect the output to be from 

```python
find_common_kmers('atgaatgcaaatga', 3, 3)
```

? 'atg' is in the sequence three times, and no other 3mer occurs more than twice, so we should see `['atg']`. An assertion expresses this idea:

In [8]:
assert find_common_kmers('atgaatgcaaatga', 3, 3) == ['atg']

AssertionError: 

Again it fails, but why? To figure out, we probably have to look at the return value:

In [9]:
find_common_kmers('atgaatgcaaatga', 3, 3)

['atg', 'a']

There is a bug in the kmer generating code. We have forgotten to make sure that we only get complete kmers. We can fix it by tweaking the call to `range()`:

In [10]:
def find_common_kmers(dna, k, threshold):
    result = []
    for start in range(len(dna) +1 -k):
        kmer = dna[start:start+k]
        if dna.count(kmer) >= threshold and kmer not in result:
            result.append(kmer)
    return result



Now both our assertions run without errors:

In [11]:
assert find_common_kmers('atgaatgc', 3, 2) == ['atg']
assert find_common_kmers('atgaatgcaaatga', 3, 3) == ['atg']

### How many tests to write?

As soon as we start thinking about testing, it's obvious that there are an infinite number of possible tests. A good way to write tests efficiently is to test extreme inputs. If it works for k=1 and k=10 then it probably works for k=2,3,4,etc.

In [12]:
assert find_common_kmers('aattggcc', 1, 2) == ['a', 't', 'g', 'c']
assert find_common_kmers('tagctagtcg', 10, 1) == ['tagctagtcg']

Another good idea is to test the function on unrealistic inputs. Example: for kmer length of zero we might expect an empty list:

In [13]:
assert find_common_kmers('tagctagtcg', 0, 2) == []

AssertionError: 

but in fact we get a list with a single element which is an empty string:

In [14]:
find_common_kmers('tagctagtcg', 0, 2)

['']

Here's why:

In [15]:
'tagctagtcg'[0:0]

''

In [16]:
'tagctagtcg'.count('')

11

To fix this let's put in a special case for k<1:

In [29]:
def find_common_kmers(dna, k, threshold):
    if k < 1:
        return []
    result = []
    for start in range(len(dna) + 1 - k):
        kmer = dna[start:start+k]
        if dna.count(kmer) >= threshold and kmer not in result:
            result.append(kmer)
    return result

Plus a few more assertions to make a test suite:

In [18]:
assert find_common_kmers('atgaatgcaaatga', 3, 3) == ['atg']
assert find_common_kmers('atgaatgc', 3, 2) == ['atg']
assert find_common_kmers('aattggcc', 1, 2) == ['a', 't', 'g', 'c']
assert find_common_kmers('tagctagtcg', 10, 1) == ['tagctagtcg'] 
assert find_common_kmers('ctagctgctcgtgactgtcagtgtacg', 2, 4) ==  ['ct', 'tg', 'gt']
assert find_common_kmers('cccaaaacccaaaacccaaaacccaaaa', 4, 4) ==  ['ccca', 'ccaa', 'caaa', 'aaaa']
assert find_common_kmers('tagctagtcg', 0, 2) == []
assert find_common_kmers('tagctagtcg', -3, 2) == []

### Refactoring and regressions

Let's do a few quick benchmarks:

In [31]:
import random
def random_dna(length):
    return "".join([random.choice(['A','T','G','C']) for _ in range(length)])

r = random_dna(2000)
%timeit find_common_kmers(r, 8, 1000)

r = random_dna(20000)
%timeit find_common_kmers(r, 8, 1000)

100 loops, best of 3: 9.09 ms per loop
1 loop, best of 3: 852 ms per loop


Increasing the size of the DNA sequence x 10 increases the time x 100. This function doesn't scale well (repeated calls to `count()`). We would like to rewrite it to be faster **without** changing its behaviour. 

We call this **refactoring**.

Here's an attempt at a better version which uses a dict to keep a running total of kmers:

In [32]:
def find_common_kmers(dna, k, threshold):
   
    kmer2count = {}
    for start in range(len(dna) + 1 - k):
        kmer = dna[start:start+k]
        old_count = kmer2count.get(kmer, 0)
        kmer2count[kmer] = old_count + 1
    
    result = []
    for kmer, count in kmer2count.items():
        if count >= threshold:
            result.append(kmer)
    return result


First let's run benchmarks again to see if it's actually faster:

In [34]:
r = random_dna(2000)
%timeit find_common_kmers(r, 8, 1000)

r = random_dna(20000)
%timeit find_common_kmers(r, 8, 1000)

The slowest run took 4.16 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 456 µs per loop
100 loops, best of 3: 4.62 ms per loop


Much better - but how do we know that we haven't changed the behaviour? Just re-run the tests:

In [35]:
assert find_common_kmers('atgaatgcaaatga', 3, 3) == ['atg']
assert find_common_kmers('atgaatgc', 3, 2) == ['atg']
assert find_common_kmers('aattggcc', 1, 2) == ['a', 't', 'g', 'c']
assert find_common_kmers('tagctagtcg', 10, 1) == ['tagctagtcg'] 
assert find_common_kmers('ctagctgctcgtgactgtcagtgtacg', 2, 4) ==  ['ct', 'tg', 'gt']
assert find_common_kmers('cccaaaacccaaaacccaaaacccaaaa', 4, 4) ==  ['ccca', 'ccaa', 'caaa', 'aaaa']
assert find_common_kmers('tagctagtcg', 0, 2) == []
assert find_common_kmers('tagctagtcg', -3, 2) == []

AssertionError: 

Something interesting - now the order is different:

In [36]:
find_common_kmers('aattggcc', 1, 2)

['a', 'c', 't', 'g']

The new version of the code doesn't preserve the input order. Do we care about this? Probably not, so let's rewrite the tests:

In [37]:
assert set(find_common_kmers('aattggcc', 1, 2)) == set(['a', 't', 'g', 'c'])

In [38]:
assert set(find_common_kmers('atgaatgcaaatga', 3, 3)) == set(['atg'])
assert set(find_common_kmers('atgaatgc', 3, 2)) == set(['atg'])
assert set(find_common_kmers('aattggcc', 1, 2)) == set(['a', 't', 'g', 'c'])
assert set(find_common_kmers('tagctagtcg', 10, 1)) == set(['tagctagtcg'])
assert set(find_common_kmers('ctagctgctcgtgactgtcagtgtacg', 2, 4)) ==  set(['ct', 'tg', 'gt'])
assert set(find_common_kmers('cccaaaacccaaaacccaaaacccaaaa', 4, 4)) ==  set(['ccca', 'ccaa', 'caaa', 'aaaa'])
assert set(find_common_kmers('tagctagtcg', 0, 2)) == set([])
assert set(find_common_kmers('tagctagtcg', -3, 2)) == set([])

AssertionError: 

This time we get down to the 7th test before something fails:

In [39]:
find_common_kmers('tagctagtcg', 0, 2)

['']

Same problem, same solution:

In [40]:
def find_common_kmers(dna, k, threshold):
    if k < 1:
        return []
   
    kmer2count = {}
    for start in range(len(dna) + 1 - k):
        kmer = dna[start:start+k]
        old_count = kmer2count.get(kmer, 0)
        kmer2count[kmer] = old_count + 1
    
    result = []
    for kmer, count in kmer2count.items():
        if count >= threshold:
            result.append(kmer)
    return result

assert set(find_common_kmers('atgaatgcaaatga', 3, 3)) == set(['atg'])
assert set(find_common_kmers('atgaatgc', 3, 2)) == set(['atg'])
assert set(find_common_kmers('aattggcc', 1, 2)) == set(['a', 't', 'g', 'c'])
assert set(find_common_kmers('tagctagtcg', 10, 1)) == set(['tagctagtcg'])
assert set(find_common_kmers('ctagctgctcgtgactgtcagtgtacg', 2, 4)) ==  set(['ct', 'tg', 'gt'])
assert set(find_common_kmers('cccaaaacccaaaacccaaaacccaaaa', 4, 4)) ==  set(['ccca', 'ccaa', 'caaa', 'aaaa'])
assert set(find_common_kmers('tagctagtcg', 0, 2)) == set([])
assert set(find_common_kmers('tagctagtcg', -3, 2)) == set([])

We have re-introduced an old bug by rewriting the code to fix a different problem. This is called a **regression**. 

We have caught the bug before running it on any real life data. 

### Setting up and tearing down

Remember the session on functional programming? Our `find_common_kmers()` function has no **side effects**, which means that it's easy to test. 

Look at a function which does have side effects. We want to take a collection of reads and filter out any that have too many Ns:

In [41]:
def filter_reads(reads, threshold): 
    # iterate over a copy of the reads, so we don't alter the list as we're iterating over it
    for read in list(reads): 
        if read.count('N') >= threshold: 
            reads.remove(read)

Make some reads with 0/1/2 Ns:

In [42]:
reads = ['ATCGTAC', 'ACTGNTTACGT', 'ACTGNNTACTG']

Now we'll test that it works with a threshold of one, which should cause the last two reads to be removed:

In [43]:
filter_reads(reads, 1)
assert reads == ['ATCGTAC']

Everything looks OK. Next, we'll test a threshold of two which should remove only the last read:

In [44]:
filter_reads(reads, 2)
assert reads == ['ATCGTAC', 'ACTGNTTACGT']

AssertionError: 

Of course, by the time we get to the second test, the last two reads have already been removed. We need to recreate the reads list each time:

In [45]:
reads = ['ATCGTAC', 'ACTGNTTACGT', 'ACTGNNTACTG']
filter_reads(reads, 1)
assert reads == ['ATCGTAC']

reads = ['ATCGTAC', 'ACTGNTTACGT', 'ACTGNNTACTG']
filter_reads(reads, 2)
assert reads == ['ATCGTAC', 'ACTGNTTACGT']

reads = ['ATCGTAC', 'ACTGNTTACGT', 'ACTGNNTACTG']
filter_reads(reads, 3)
assert reads == ['ATCGTAC', 'ACTGNTTACGT', 'ACTGNNTACTG']

Imagine a much more complicated set-up scenario - most of the code will be set up. We can turn the set up stuff into a function:

In [46]:
reads = []
def create_reads():
    global reads
    reads = ['ATCGTAC', 'ACTGNTTACGT', 'ACTGNNTACTG']

create_reads()  
filter_reads(reads, 1)
assert reads == ['ATCGTAC']

create_reads()
filter_reads(reads, 2)
assert reads == ['ATCGTAC', 'ACTGNTTACGT']

create_reads()    
filter_reads(reads, 3)
assert reads == ['ATCGTAC', 'ACTGNTTACGT', 'ACTGNNTACTG']

Which is a bit better - easier to read - but still very repetitive. 

## Testing with `nose`

`nose` is a testing framework that does all the stuff we've seen, but better. 

### Naming conventions for `nose`

- files that contain test code should start with `test` 
- tests should be in functions that start with `test`

In [47]:
def find_common_kmers(dna, k, threshold): 
    result = [] 
    for start in range(len(dna)): 
        kmer = dna[start:start+k] 
        if dna.count(kmer) >= threshold: 
            result.append(kmer) 
    return result 
 

def test_3mers(): 
    assert(find_common_kmers('atgaatgcaaatga', 3, 3) == ['atg']) 

To run tests from the command line:

`nosetests`

and it will find and run all test code.

In [49]:
!nosetests

F
FAIL: test_kmer.test_3mers
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/media/martin/exports/Dropbox/python_for_biologists/current_courses/Advanced Python template/jupyter notebooks/test_kmer.py", line 13, in test_3mers
    assert(find_common_kmers('atgaatgcaaatga', 3, 3)==['atg'])
AssertionError

----------------------------------------------------------------------
Ran 1 test in 0.002s

FAILED (failures=1)


We get

- the name of the test that failed
- the name of the test file
- how many tests
- how long it took to run

Nose can do more specific tests than just `assert`:

In [37]:
from nose.tools import assert_equal 

def find_common_kmers(dna, k, threshold): 
    result = [] 
    for start in range(len(dna)): 
        kmer = dna[start:start+k] 
        if dna.count(kmer) >= threshold: 
            result.append(kmer) 
    return result 
 

def test_3mers(): 
    assert_equal(find_common_kmers('atgaatgcaaatga', 3, 3),['atg']) 

In [38]:
!nosetests

F
FAIL: test_kmer.test_3mers
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/martin/Dropbox/Public/python_courses/eg_ap_2015/test_kmer.py", line 13, in test_3mers
    assert_equal(find_common_kmers('atgaatgcaaatga', 3, 3),['atg'])
AssertionError: Lists differ: ['atg', 'atg', 'atg', 'a'] != ['atg']

First list contains 3 additional elements.
First extra element 1:
atg

- ['atg', 'atg', 'atg', 'a']
+ ['atg']

----------------------------------------------------------------------
Ran 1 test in 0.007s

FAILED (failures=1)


Now the output is much more interesting:

- what caused the error (lists differ)
- the expected and actual outputs
- how many additional elements were in the actual output
- the value of the first unexpected element

making it much easier to track down the error. 

Fix the bug (unique output elements and incomplete kmers):

In [None]:
from nose.tools import assert_equal 
 
def find_common_kmers(dna, k, threshold):
    result = []
    for start in range(len(dna) + 1 - k):
        kmer = dna[start:start+k]
        if dna.count(kmer) >= threshold and kmer not in result:
            result.append(kmer)
    return result
 
def test_3mers(): 
    assert_equal(find_common_kmers('atgaatgcaaatga', 3, 3), ['atg']) 

In [None]:
!nosetests

Let's add the rest of the test suite with `assert_equal`:

In [None]:
from nose.tools import assert_equal 
 
def find_common_kmers(dna, k, threshold): 
    result = [] 
    for start in range(len(dna) + 1 - k): 
        kmer = dna[start:start+k] 
        if dna.count(kmer) >= threshold and kmer not in result: 
            result.append(kmer) 
    return result 
 
def test_3mers(): 
    assert_equal(find_common_kmers('atgaatgcaaatga', 3, 3), ['atg'])

def test_low_threshold(): 
    assert_equal(find_common_kmers('atgaatgc', 3, 2) , ['atg']) 

def test_single_bases(): 
    assert_equal(find_common_kmers('aattggcc', 1, 2) , 
                  ['a', 't', 'g', 'c']) 

def test_whole_sequence(): 
    assert_equal(find_common_kmers('tagctagtcg', 10, 1) , ['tagctagtcg']) 

def test_long_sequence(): 
    assert_equal(find_common_kmers('ctagctgctcgtgactgtcagtgtacg', 2, 4),
                  ['ct', 'tg', 'gt']) 

def test_long_sequence_4mers(): 
    assert_equal(find_common_kmers('cccaaaacccaaaacccaaaacccaaaa',4, 4) ,
                  ['ccca', 'ccaa', 'caaa', 'aaaa']) 

def test_zero_length_kmer(): 
    assert_equal(find_common_kmers('tagctagtcg', 0, 2) , []) 

def test_negative_length_kmer(): 
    assert_equal(find_common_kmers('tagctagtcg', -3, 2) , []) 

In [50]:
!nosetests

......FF
FAIL: test_kmer.test_zero_length_kmer
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/media/martin/exports/Dropbox/python_for_biologists/current_courses/Advanced Python template/jupyter notebooks/test_kmer.py", line 33, in test_zero_length_kmer
    assert_equal(find_common_kmers('tagctagtcg', 0, 2) , [])
AssertionError: Lists differ: [''] != []

First list contains 1 additional elements.
First extra element 0:


- ['']
+ []

FAIL: test_kmer.test_negative_length_kmer
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/media/martin/exports/Dropbox/python_for_biologists/current_courses/Advanced Python template/jupyter 

Notice how `nose` doesn't quit after the first failed test - it runs them all. We can easily spot the problem (k<1) and fix it.

Now let's try the refactored version that uses a dict:

In [None]:
def find_common_kmers(dna, k, threshold): 
    if k < 1:
        return []

    kmer2count = {} 
    for start in range(len(dna) + 1 - k): 
        kmer = dna[start:start+k] 
        old_count = kmer2count.get(kmer, 0) 
        kmer2count[kmer] = old_count + 1 
 
    result = [] 
    for kmer, count in kmer2count.items(): 
        if count >= threshold: 
            result.append(kmer) 
    return result 

In [51]:
!nosetests

..F.FF..
FAIL: test_kmer.test_single_bases
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/media/martin/exports/Dropbox/python_for_biologists/current_courses/Advanced Python template/jupyter notebooks/test_kmer.py", line 27, in test_single_bases
    ['a', 't', 'g', 'c'])
AssertionError: Lists differ: ['a', 'c', 't', 'g'] != ['a', 't', 'g', 'c']

First differing element 1:
c
t

- ['a', 'c', 't', 'g']
?       -----

+ ['a', 't', 'g', 'c']
?               +++++


FAIL: test_kmer.test_long_sequence
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/media/martin/exports/Dropbox/python_for_biologists/current_courses/Advanced 

As we know, this affects the order - which is easy to see from the tests. We could fix using `set()` as before, but `nose` has a special function for ignoring order:

In [None]:
from nose.tools import assert_items_equal

def test_3mers(): 
    assert_items_equal(find_common_kmers('atgaatgcaaatga', 3, 3), ['atg'])

def test_low_threshold(): 
    assert_items_equal(find_common_kmers('atgaatgc', 3, 2) , ['atg'])

def test_single_bases(): 
    assert_items_equal(find_common_kmers('aattggcc', 1, 2) , ['a', 't', 'g', 'c']) 

def test_whole_sequence(): 
    assert_items_equal(find_common_kmers('tagctagtcg', 10, 1) , ['tagctagtcg']) 

def test_long_sequence(): 
    assert_items_equal(find_common_kmers('ctagctgctcgtgactgtcagtgtacg', 2, 4),  ['ct', 'tg', 'gt']) 

def test_long_sequence_4mers(): 
    assert_items_equal(find_common_kmers('cccaaaacccaaaacccaaaacccaaaa',4, 4) , ['ccca', 'ccaa', 'caaa', 'aaaa']) 

def test_zero_length_kmer(): 
    assert_items_equal(find_common_kmers('tagctagtcg', 0, 2) , []) 

def test_negative_length_kmer(): 
    assert_items_equal(find_common_kmers('tagctagtcg', -3, 2) , []) 

In [52]:
!nosetests

........
----------------------------------------------------------------------
Ran 8 tests in 0.002s

OK


### Setting up and tearing down

For the situation where we need to set up for tests (like in our `filter_reads()`) use `with_setup()`:

In [None]:
from nose.tools import assert_equal 
from nose.tools import with_setup 
 
def filter_reads(reads, threshold): 
    for read in list(reads): 
        if read.count('N') >= threshold: 
            reads.remove(read) 

reads = [] 
def create_reads(): 
    global reads 
    reads = ['ATCGTAC', 'ACTGNTTACGT', 'ACTGNNTACTG'] 
 
 
@with_setup(create_reads) 
def test_threshold_one(): 
    filter_reads(reads, 1) 
    assert_equal(reads,  ['ATCGTAC']) 

This is clearer and easier to read. There's also a `teardown` argument if we need to tidy up after a test. 

### Different types of assertions

We have seen `assert_equals` and `assert_items_equal`. There are others:

- `assert_true`
- `assert_in`  (is element in a list?)
- `assert_is_instance` (check what class something is)
- `assert_regexp_matches` (for strings)

Why bother using these? more readable, and more helpful output. For example does a function return a valid base?

In [53]:
def foo():
    return 'Q'

In [54]:
from nose.tools import assert_equals
assert_equals(foo() in ['A', 'T', 'G', 'C'], True)

AssertionError: False != True

In [55]:
from nose.tools import assert_true
assert_true(foo() in ['A', 'T', 'G', 'C'])

AssertionError: False is not true

In [44]:
from nose.tools import assert_in
assert_in(foo(), ['A', 'T', 'G', 'C'])

AssertionError: 'Q' not found in ['A', 'T', 'G', 'C']

### Testing numbers

Floating point calculations in Python (and other languages) have limited accuracy:

In [56]:
0.1 + 0.2

0.30000000000000004

So we can end up with the following situation:

In [58]:
from nose.tools import assert_equal
assert_equal(0.1 + 0.2, 0.3)

AssertionError: 0.30000000000000004 != 0.3

To avoid this, use `assert_almost_equal` for dealing with floating point numbers:

In [59]:
from nose.tools import assert_almost_equals
assert_almost_equals(0.1 + 0.2, 0.3)

### Testing exceptions

If we add some error checking with exceptions to `find_common_kmers()`:

In [60]:
def find_common_kmers(dna, k, threshold):

    if not isinstance(k, int):
        raise TypeError("k-mer length must be an integer")
    if k < 1:
        raise ValueError("k-mer length must be a positive integer")
    
    result = []
    
    kmer2count = {}
    for start in range(len(dna) + 1 - k):
        kmer = dna[start:start+k]
        old_count = kmer2count.get(kmer, 0)
        kmer2count[kmer] = old_count + 1

    
    for kmer, count in kmer2count.items():
        if count >= threshold:
            result.append(kmer)
   
    return result

how can we test it? 

In [61]:
from nose.tools import assert_raises
assert_raises(TypeError, find_common_kmers, "atcgttactaac", 1.5, 2)
assert_raises(ValueError, find_common_kmers, "atcgttcgctaac", -2, 2)

## Exercises

**no solutions to these**

Pick a piece of your own code, or one of the exercises to previous solutions. Write a test suite for it using either `assert` or `nose`. 

Think about the smallest bits of functionality that you can test. 

How might you redesign the code to make testing easier?

In [3]:
# ignore this cell, it's for loading custom js code
from IPython.core.display import Javascript
Javascript(filename="custom.js")

<IPython.core.display.Javascript object>

In [4]:
# ignore this cell, it's for loading custom css code
from IPython.core.display import HTML
HTML(filename="custom.css")