Skip to content

Latest commit

 

History

History
422 lines (295 loc) · 18.5 KB

testing.rst

File metadata and controls

422 lines (295 loc) · 18.5 KB

Testing your code

Suppose you collaborate on writing code: how do you make sure none of your collaborators break your code? The answer: introduce quality controls. We can do this by testing the code frequently. To make sure you do this (and do it consistently), we should automate this process, so that we can make sure it happens every time that the code is committed to your repository.

Note

As with other parts of this book, the tools and examples in this chapter use python as an example. The principles which are being introduced are very general, however, and the vast majority of mature programming languages will have testing toolkits available or built-in.

Static Analysis

The simplest level of quality control which most large coding projects expect (or sometimes demand) is that your code follows a given "style guide". Much like how different publishers will have a view on whether an article should (or should not) make use of the serial (so-called "Oxford) comma, project maintainers will often have strong feelings about the formatting of code in their codebase. This can be both for performance reasons, and to make reading the code easier.

While it's less important to do this if you're working along on your own code, it can still be helpful to maintain some conventions (such as capitalising the names of classes, in the case of Python). A popuarly used style guide for Python is PEP8, which aims to make your code as readible as possible.

Enforcing, or checking that code follows a given set of guidelines can be done with tools which perform "static analysis" on the code. These read and parse the code, but they don't actually run it. They then check the code against a style guide, and against the grammar rules for the language, and attempt to indentify problems.

Two of the most popular analysis tools for Python are Pylint and flake8. These have overlapping feature sets, but running both can sometimes be helpful.

To get to grips with what these do, let's take two different code examples; the first one has good style, and the other bad.

.. literalinclude:: goodstyle.py
   :language: python
   :linenos:

.. literalinclude:: badstyle.py
   :language: python
   :linenos:

Pylint

Pylint tries to identify a series of stylistic errors in your code, and then provide a score at the end.

The code in goodstyle.py does pretty well:

$ pylint goodstyle.py

------------------------------------
Your code has been rated at 10.00/10

But we score a 0/10 for our badstyle.py code...

$ pylint badstyle.py

bad-style.py:5:0: C0303: Trailing whitespace (trailing-whitespace)
bad-style.py:10:0: C0303: Trailing whitespace (trailing-whitespace)
bad-style.py:1:0: C0114: Missing module docstring (missing-module-docstring)
bad-style.py:2:0: C0103: Function name "HelloWorld" doesn't conform to snake_case naming style (invalid-name)
bad-style.py:2:0: C0116: Missing function or method docstring (missing-function-docstring)

------------------------------------
Your code has been rated at 0.00/10

The linter tells us which line the problem arises in (the first warning is for line 5, for example), and then gives a short description of how the code is breaking the style guide.

flake8

The good code provides no output, as no problems are found.

$ flake8 goodstyle.py

For the badly styled code we get a series of errors and warnings.

$ flake8 badstyle.py

bad-style.py:2:3: E999 IndentationError: unexpected indent
bad-style.py:2:4: E111 indentation is not a multiple of four
bad-style.py:2:4: E113 unexpected indentation
bad-style.py:2:25: E251 unexpected spaces around keyword / parameter equals
bad-style.py:2:27: E251 unexpected spaces around keyword / parameter equals
bad-style.py:5:1: W293 blank line contains whitespace
bad-style.py:6:7: E303 too many blank lines (3)
bad-style.py:6:7: E111 indentation is not a multiple of four
bad-style.py:7:11: E111 indentation is not a multiple of four
bad-style.py:7:16: E225 missing whitespace around operator
bad-style.py:8:7: E111 indentation is not a multiple of four
bad-style.py:9:11: E111 indentation is not a multiple of four
bad-style.py:9:16: E225 missing whitespace around operator
bad-style.py:11:7: E111 indentation is not a multiple of four

You can see that in comparison to pylint there's a lot of output, and a lot of it relates to spacing and line breaks. I'll pull-out the E111 indentation is not a multiple of four lines for special mention. These are grumbling about the use of tabs to indent code rather than four spaces; either will work (though spaces are generally preferred in most style guides). However, if you're using python3 then you can't mix both indenting styles in the same file.

Unit testing

While static analysis can show up problems with the syntax or the style of your code, most bugs won't show-up until you've actually executed the code. There are a few approaches to testing your code for these sorts of bugs. The simplest is called "unit testing", where individual units of code are run, and the behaviour of the unit is examined. In Python a unit is likely to be a function or a class method.

Units are tested in isolation from each other, which allows problems to be identified quickly, but generally aren't a realistic representation of how code is used; I'll cover the additional methods for this later in this chapter.

There are a number of tools in Python to help you create tests; I'll use the unittest library here, because it's part of the standard python libraries which should be present in any python installation. Other tools, such as Pytest make some things simpler, however.

Let's write a function which we want to test; this is one which checks if a given input is prime.

.. literalinclude:: examples/prime.py
   :language: python

Now, this is a very simple function, which just searches for factors of a number. If it can't find any it returns True, since that number is prime.

We want to test this; there are some things we might regard as "edge cases" which we should look out for, and we should check that it actually identifies a prime number correctly. Here are some things I can think of which we might want to check are handled correctly:

  • Is 2 identified as prime?
  • Is 1 identified as not prime?
  • What happens when 0 or a negative number is provided?
  • What happens if the input isn't an integer?

Before we even start writing tests we can see that we actually need to put some thought into answering some of these, especially the last two. We need to define a behaviour in both cases, since it's likely that most approaches we'll take to writing an algorithm won't cover these. This highlights one of the reasons that writing tests is important: they can act like a design specification for your code. In fact, practitioners of test-driven development would suggest that you should write the tests before you write any code.

Test-driven development

There's a lot of merit in this approach, but it can be difficult to get started. In this chapter I'm starting with a simple unit of code, but once you understand the basics of writing tests I would strongly encourage you to introduce test-driven principles into some or all of your development.

Let's look at an example test.

def is_prime(number):
 """Test if a number is prime. Return True if prime, and False if not."""
 for i in range(number):
     if number % 2 == 0:
         return False
     elif (num % i) == 0:
         return False
     else:
         return True

So, what's going on here? Well, fundamentally this file is just a python script, and we've added in some boilerplate to help the unittest tool do its job. The first thing is importing both unittest and our function (which lives in a file called prime.py). The tool groups together tests into classes, with each individual test defined by a method on that class. The class needs to inherit the unittest.TestCase class.

Note

Don't worry if you're not too clear on what's going on here with things like class inheritence, it just gives our testing class access to a number of additional tools from the library. For now you can just modify these samples, and as you get more confident in what you're doing you can consult the documentation on writing more complex tests.

The code which defines the test is contained in the test_number_two method. All that is does is evaluate the is_prime function, and checks that it returns True. The assertTrue method is one of the methods which unittest provides to help us; if the result is False it will cause the test to fail.

Let's run the test. I've saved the script above in a file called test_prime.py. In order to run the test I need to call python on it with an additional option, -m unittest:

$ python -m unittest test-prime.py

F
======================================================================
FAIL: test_number_two (test-prime.PrimeTests)
Test that two is prime.
----------------------------------------------------------------------
Traceback (most recent call last):
File "test-prime.py", line 9, in test_number_two
self.assertTrue(is_prime(2))
AssertionError: False is not true

----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (failures=1)

Well, you can see that we didn't exactly pass with flying colours... Looking back at the code we can see that it will reject all even numbers as non-prime; we need to handle 2 differently.

Adding a case in the if statement for 2 is one way to do this:

def is_prime(number):
   """Test if a number is prime. Return True if prime, and False if not."""
   if (number == 2):
       return True
   for i in range(number):
       if number % 2 == 0:
           return False
       elif (num % i) == 0:
           return False
       else:
           return True

Now our test gives us:

$ python -m unittest test-prime.py

.
----------------------------------------------------------------------
Ran 1 test in 0.000s

OK

Which is good! We've fixed a problem in our code! Let's go on to write tests for the remaining conditions I identified earlier.

import unittest
from prime import is_prime

class PrimeTests(unittest.TestCase):
    """Test prime number checking."""

    def test_number_two(self):
        """Test that two is prime."""
        self.assertTrue(is_prime(2))

    def test_number_one(self):
        """Test that one is not prime."""
        self.assertFalse(is_prime(1))

    def test_number_zero(self):
        """Test that zero is not prime."""
        self.assertFalse(is_prime(0))

    def test_number_negative(self):
        """Test that negative numbers are not prime."""
        self.assertFalse(is_prime(-3))

    def test_actual_prime(self):
        """Test that a known prime is prime."""
        self.assertTrue(is_prime(13))

Running this we find that there are still problems:

E.E..
======================================================================
ERROR: test_actual_prime (test-prime.PrimeTests)
Test that a known prime is prime.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/daniel/academic-notes/notes-software/examples/test-prime.py", line 25, in test_actual_prime
    self.assertTrue(is_prime(13))
  File "/home/daniel/academic-notes/notes-software/examples/prime.py", line 8, in is_prime
    elif (num % i) == 0:
NameError: name 'num' is not defined

======================================================================
ERROR: test_number_one (test-prime.PrimeTests)
Test that one is not prime.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/daniel/academic-notes/notes-software/examples/test-prime.py", line 13, in test_number_one
    self.assertFalse(is_prime(1))
  File "/home/daniel/academic-notes/notes-software/examples/prime.py", line 8, in is_prime
    elif (num % i) == 0:
NameError: name 'num' is not defined

----------------------------------------------------------------------
Ran 5 tests in 0.001s

FAILED (errors=2)

The first line here, E.E.. reveals that two of the tests had errors; the first and third ones which were run. Then there are individual messages which reveal why the test couldn't finish; there's a bug in the code! num isn't defined. If we fix this (changing num to number) and run again... only to find another error.

$ python -m unittest test-prime.py

E.E..
======================================================================
ERROR: test_actual_prime (test-prime.PrimeTests)
Test that a known prime is prime.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/daniel/academic-notes/notes-software/examples/test-prime.py", line 25, in test_actual_prime
    self.assertTrue(is_prime(13))
  File "/home/daniel/academic-notes/notes-software/examples/prime.py", line 8, in is_prime
    elif (number % i) == 0:
ZeroDivisionError: integer division or modulo by zero

======================================================================
ERROR: test_number_one (test-prime.PrimeTests)
Test that one is not prime.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/daniel/academic-notes/notes-software/examples/test-prime.py", line 13, in test_number_one
    self.assertFalse(is_prime(1))
  File "/home/daniel/academic-notes/notes-software/examples/prime.py", line 8, in is_prime
    elif (number % i) == 0:
ZeroDivisionError: integer division or modulo by zero

----------------------------------------------------------------------
Ran 5 tests in 0.001s

FAILED (errors=2)

This one's a result of choosing to use range(number); we can update this to range(2, number) and get...

.....
----------------------------------------------------------------------
Ran 5 tests in 0.000s

OK

We've passed our tests!

Non-integers

There are a couple of plausible approaches for this which I'll lay out. One option is that you have your function raise a ValueError if type(number) != int, and then test using an assertRaise that the correct exception gets raised. Alternatively you could define anything else as non-prime; the best solution will vary depending on the scenario, though I'd generally favour the former.

I've still not included a test for non-integer input, but that one, in classical academic style, is left as an exercise to the reader.

As your codebase evolves you're likely to end-up with large amounts of code you'll need to test. A good practice here is to keep all your test scripts in a directory called tests within your project repositorty. It can also be a good idea to separate the unit tests into their own tests/unit directory.

unittest can then be used to run all the tests in that directory using the command

$ python -m unittest discover -s tests/unit

For this to work the scripts need to have names of the form test_foo; the individual test classes in those scripts should have names starting in Test, such as TestSpam, and then the individual test methods should have names like test_eggs.

Coverage

Unit testing is most useful when all of the code in the codebase is checked by your test suite. Code coverage is a measure of the fraction of lines of the code which are run by a given test suite.

A tool called coverage.py can be used to assess this for the unittests in this section. You can install it by running

$ pip install coverage

and then changing the way that you call your tests to

$ coverage -m unittest test-prime.py

This will gather the information which is needed to measure the coverage, and a report can be made by running

$ coverage report -m
Name            Stmts   Miss  Cover   Missing
---------------------------------------------
prime.py            9      2    78%   7, 9
test-prime.py      13      0   100%
---------------------------------------------
TOTAL              22      2    91%

So we can see lines 7 and 9 are never called by our test suite as it currently stands.

End-to-end testing

Testing that individual functions produce the results we expect is a powerful technique, but it doesn't reflect the way that code actually runs. End-to-end, or integration testing is used to ensure that several units of code interact the way that we expect.

Writing integration tests doesn't differ much from writing a unit test, but we need to include all the components of the workflow. This is why separating these from unittests can be sensible; your code might take minutes or hours to run its integration tests, and running them every time we make a change might be time consuming. Instead, by keeping them in e.g. a tests/integration directory we can run them with

$ python -m unittest discover -s tests/integration

Regression testing

When you add new features or tidy-up (refactor) code, you want to ensure that it continues to provide the same results as older versions of the code.

This is where regression testing comes into play. The way that you go about writing this type of test will vary depending on how your code works, but also if you have external requirements, such as whether the code needs to be able to reproduce reviewed results.

For example, if you have a set of reviewed results, you could write a test script which attempts to reproduce these under the same configuration, and checks that the results are still produced after the code is updated.