# Fuzzing Test Suites with Mutation Analysis

In the [chapter on coverage](Coverage.ipynb), we showed how one identify which parts of the program are executed by a program, and hence get a sense of the effectiveness of a set of test cases in covering the program structure. However, is structural coverage a good measure of effectiveness? One of the problems with structural coverage measures is that it fails to check whether the program executions generated by the test suite were actually correct. That is, an execution that produces a wrong output that is unnoticed by the test suite is counted exactly the same as an execution that produces the right output for coverage. Indeed, if one deletes the assertions in a typical test case, the coverage would not change for the new test suite, but the new test suite is much less useful than the original one.

This is indeed, not an optimal state of affairs. How can we verify that our tests are actually useful? One alternative (hinted in the chapter on coverage) is to inject bugs into the program, and evaluate the effectiveness of test suites in catching these injected bugs. However, that that introduces another problem. How do we produce these bugs in the first place? Any manual effort is likely to be biased by the preconceptions of the developer as to where the bugs are likely to occur, and what effect it would have. Further, writing good bugs is likely to take a significant amount of time, for a very indirect benefit. Hence such a solution is not sufficient.

Mutation Analysis offers an alternative solution. The insight from Mutation Analysis is to consider the probability of insertion of a bug from the perspective of a programmer. If one assumes that the attention received by each program element in the program is sufficiently similar, one can further assume that each token in the program have a similar probability of being incorrectly transcribed. Of course, the programmer will correct any mistakes that gets detected by the compilers (or other static analysis tools). So the set of valid tokens different from the original that make it past the compilation stage is considered to be its possible set of _mutations_ that represent the _probable faults_ in the program. A test suite is then judged by its capability to detect (and hence prevent) such mutations. The proportion of such mutants detected over all _valid_ mutants produced is taken as the mutation score. In this chapter, we see how one can implement Mutation Analysis in Python programs. The mutation score obtained represents the ability of any program analysis tools to prevent faults, and can be used to judge static test suites, test generators such as fuzzers, and also static and symbolic execution frameworks.

It might be intuitive to consider a slightly different perspective. A test suite is a program that can be considered to accept as its input, the program to be tested.  What is the best way to evaluate such a program (the test suite)? We can essentially *fuzz* the test suite by applying small mutations to the input program, and verifying that the test suite in question does not produce unexpected behaviors. The test suite is supposed to only allow the original through; and hence any mutant that is not detected as faulty represents a bug in the test suite.

**Prerequisites**

* You need some understanding of how a program is executed.
* You should have read [the chapter on coverage](Coverage.ipynb).

## Is Structural Coverage Adequacy Sufficient?

Consider the `triangle()` program below. We want to verify that the program works correctly.

In [None]:
def triangle(a, b, c):
    if a == b:
        if b == c:
            return 'Equilateral'
        else:
            return 'Isosceles'
    else:
        if b == c:
            return "Isosceles"
        else:
            if a == c:
                return "Isosceles"
            else:
                return "Scalene"

Here are a few test cases to ensure that the program works.

In [None]:
def strong_oracle(fn):
    assert fn(1,1,1) == 'Equilateral'

    assert fn(1,2,1) == 'Isosceles'
    assert fn(2,2,1) == 'Isosceles'
    assert fn(1,2,2) == 'Isosceles'

    assert fn(1,2,3) == 'Scalene'

What is the effectiveness of our test suite? As we saw in the [chapter on coverage](Coverage.ipynb), one can use structural coverage techniques such as statement coverage to obtain a measure of effectiveness of the test case.

In [None]:
import fuzzingbook_utils

In [None]:
from Coverage import Coverage

In [None]:
import inspect

We add a function `show_coverage()` to visualize the coverage obtained.

In [None]:
class Coverage(Coverage):
    def show_coverage(self, fn):
        src = inspect.getsource(fn)
        name = fn.__name__
        covered = set([lineno for method, lineno in self._trace if method == name])
        for i, s in enumerate(src.split('\n')):
            print('%s %2d: %s' % ('#' if i + 1 in covered else ' ', i + 1, s))

In [None]:
with Coverage() as cov:
    strong_oracle(triangle)

In [None]:
cov.show_coverage(triangle)

Our `strong_oracle()` seems to have adequately covered all possible conditions.
That is, our set of test cases is reasonably good according to structural coverage. However, does the coverage obtained tell the whole story? Consider this test suite instead:

In [None]:
def weak_oracle(fn):
    assert fn(1,1,1) == 'Equilateral'

    assert fn(1,2,1) != 'Equilateral'
    assert fn(2,2,1) != 'Equilateral'
    assert fn(1,2,2) != 'Equilateral'

    assert fn(1,2,3) != 'Equilateral'

All that we are checking here is that a triangle with unequal sides is not equilateral. What is the coverage obtained?

In [None]:
with Coverage() as cov:
    weak_oracle(triangle)

In [None]:
cov.show_coverage(triangle)

Indeed, there does not seem to be _any_ difference in coverage.
The `weak_oracle()` obtains exactly the same coverage as that of `strong_oracle()`. However, a moment's reflection should convince one that the `weak_oracle()` is not as effective as `strong_oracle()`. However, _coverage_ is unable to distinguish between the two test suites. What are we missing in coverage?
The problem here is that coverage is unable to evaluate the _quality_ of our assertions. Indeed, coverage does not care about assertions at all. However, as we saw above, assertions are an extremely important part of test suite effectiveness. Hence, what we need is a way to evaluate the quality of assertions.

## Fault Injection

Notice that in the [chapter on coverage](Coverage.ipynb), coverage was presented as a _proxy_ for the likelihood of a test suite to uncover bugs. What if actually try to evaluate the likelihood of a test suite to uncover bugs? All we need is to inject bugs into the program, one at a time, and count the number of such bugs that our test suite detects. The frequency of detection will provide us with the actual likelihood of the test suite to uncover bugs. This technique is called _fault injection_. Here is an example for _fault injection_.

In [None]:
def triangle_m1(a, b, c):
    if a == b:
        if b == c:
            return 'Equilateral'
        else:
            #return 'Isosceles'
            return None #<-- injected fault
    else:
        if b == c:
            return "Isosceles"
        else:
            if a == c:
                return "Isosceles"
            else:
                return "Scalene"

Let us see if our test suites are good enough to catch this fault. We first check whether `weak_oracle()` can detect this change.

In [None]:
from ExpectError import ExpectError

In [None]:
with ExpectError():
    weak_oracle(triangle_m1)

The `weak_oracle()` is unable to detect any changes. What about our `strong_oracle()`?

In [None]:
with ExpectError():
    strong_oracle(triangle_m1)

Our `strong_oracle()` is able to detect this fault, which is evidence that `strong_oracle()` is probably a better test suite.

_Fault injection_ can provide a good measure of effectiveness of a test suite, provided we have a list of possible faults. The problem is that collecting such a set of _unbiased_ faults is rather expensive. It is difficult to create good faults that are reasonably hard to detect, and it is a manual process. Given that it is a manual process, the generated faults will be biased by the preconceptions of the developer who creates it. Even when such curated faults are available, they are unlikely to be exhaustive, and likely to miss important classes of bugs, and parts of the program. Hence, _fault injection_ is an insufficient replacement for coverage. Can we do better?

Mutation Analysis provides an alternative to a curated set of faults. The key insight is that, if one assumes that the programmer understands the program in question, the majority of errors made are very likely small transcription errors (a small number of tokens). A compiler will likely catch most of these errors. Hence, the majority of residual faults in a program is likely to be due to small (single token) variations at certain points in the structure of the program from the correct program (This particular assumption is called the *Competent Programmer Hypothesis* or the *Finite Neighborhood Hypothesis*). What about the larger faults composed of multiple changes to the program? The key insight here is that, for a significant majority of such faults, test cases that can detect a single change in isolation is very likely to detect the larger composite fault that contains it. (This assumption is called the *Coupling Effect*.) How can we use these assumptions in practice? The idea is to simply generate *all* possible *valid* variants of the program that differs from the original by a small change (such as a single token change) (Such variants are called *mutants*). Next, the given test suite is applied to each variant thus generated. Any mutant detected by the test suite is said to have been *killed* by the test suite. The effectiveness of a test suite is given by the proportion of mutants killed to the valid mutants generated.

We next implement a simple mutation analysis framework and use it to evaluate our test suites.

## Simple Mutator for Functions

Consider the `triangle()` program we discussed previously. A simple way to produce valid mutated version of this program is to replace some of its statements by `pass`.

We begin by importing the AST manipulation modules.

In [None]:
import ast

In [None]:
import astunparse

The `MuFunctionAnalyzer` is the main class responsible for mutation analysis of the test suite. It accepts the function to be tested. It normalizes the source code given by parsing and unparsing it once. This is required to ensure that later `diff`s between the original and mutant is not derailed by differences in whitespace comments etc.

In [None]:
class MuFunctionAnalyzer:
    def __init__(self, fn, log=False):
        self.fn = fn
        self.name = fn.__name__
        src = inspect.getsource(fn)
        self.ast = ast.parse(src)
        self.src = astunparse.unparse(self.ast) # normalize
        self.mutator = self.mutator_object()
        self.nmutations = self.get_mutation_count()
        self.un_detected = set()
        self.mutants = []
        self.log = log
        
    def mutator_object(self, locations=None):
        return StmtDeletionMutator(locations)
    
    def register(self, m):
        self.mutants.append(m)
        
    def finish(self):
        pass

The `get_mutation_count()` fetches the number of possible mutations available. We will see later how this can be implemented.

In [None]:
class MuFunctionAnalyzer(MuFunctionAnalyzer):
    def get_mutation_count(self):
        self.mutator.visit(self.ast)
        return self.mutator.count

The `Mutator` provides the base class for implementing individual mutations. It accepts a list of locations to mutate. It assumes that the method `mutable_visit` is invoked on all nodes of interest as determined by the subclass. When the `Mutator` is invoked without a list of locations to mutate, it simply loops through all possible mutation points and retains a count in `self.count`. If it is invoked with a specific list of locations to mutate, the `mutable_visit()` method calls the `mutation_visit()` which performs the mutation on the node. Note that a single location can produce multiple mutations. (Hence the hashmap).

In [None]:
class Mutator(ast.NodeTransformer):
    def __init__(self, mutate_location=-1):
        self.count = 0
        self.mutate_location = mutate_location

    def mutable_visit(self, node):
        self.count += 1 # statements start at line no 1
        if self.count == self.mutate_location:
            return self.mutation_visit(node)
        return self.generic_visit(node)

The `StmtDeletionMutator` simply hooks into all the statement processing visitors. It performs mutation by replacing the given statement with `pass`.

In [None]:
class StmtDeletionMutator(Mutator):
    def mutation_visit(self, Node): return ast.Pass()
    def visit_Return(self, node): return self.mutable_visit(node)
    def visit_Delete(self, node): return self.mutable_visit(node)

    def visit_Assign(self, node): return self.mutable_visit(node)
    def visit_AnnAssign(self, node): return self.mutable_visit(node)
    def visit_AugAssign(self, node): return self.mutable_visit(node)

    def visit_Raise(self, node): return self.mutable_visit(node)
    def visit_Assert(self, node): return self.mutable_visit(node)

    def visit_Global(self, node): return self.mutable_visit(node)
    def visit_Nonlocal(self, node): return self.mutable_visit(node)

    def visit_Expr(self, node): return self.mutable_visit(node)

    def visit_Pass(self, node): return self.mutable_visit(node)
    def visit_Break(self, node): return self.mutable_visit(node)
    def visit_Continue(self, node): return self.mutable_visit(node)

We can obtain the number of mutations produced for `triangle()` as follows.

In [None]:
MuFunctionAnalyzer(triangle).nmutations

We need a way to obtain the individual mutants. For this, we convert our `ProgramMutator` to an *iterable*.

In [None]:
class MuFunctionAnalyzer(MuFunctionAnalyzer):
    def __iter__(self):
        return PMIterator(self)

The `PMIterator`, which is the *iterator* class for `ProgramMutator` is defined as follows.

In [None]:
class PMIterator:
    def __init__(self, pm):
        self.pm = pm
        self.idx = 0

The `next()` method returns the corresponding `Mutant`

In [None]:
class PMIterator(PMIterator):
    def __next__(self):
        i = self.idx
        if i >= self.pm.nmutations:
            self.pm.finish()
            raise StopIteration()
        self.idx += 1
        mutant = Mutant(self.pm, self.idx, log=self.pm.log)
        self.pm.register(mutant)
        return mutant

The `Mutant` class contains logic for generating mutants when given the locations to mutate.

In [None]:
class Mutant:
    def __init__(self, pm, location, log=False):
        self.pm = pm
        self.i = location
        self.name = "%s_%s" % (self.pm.name, self.i)
        self._src = None
        self.tests = []
        self.detected = False
        self.log = log

Here is how it can be used:

In [None]:
for m in MuFunctionAnalyzer(triangle):
    print(m.name)

The `generate_mutant()` simply calls the `mutator()` method, and passes the mutator a copy of the AST. 

In [None]:
class Mutant(Mutant):
    def generate_mutant(self, location):
        mutant_ast = self.pm.mutator_object(location).visit(ast.parse(self.pm.src)) # copy
        return astunparse.unparse(mutant_ast)

The `src()` method returns the mutated source.

In [None]:
class Mutant(Mutant):
    def src(self):
        if self._src is None:
            self._src = self.generate_mutant(self.i)
        return self._src

Here is how one can obtain the mutants, and visualize the difference from the original:

In [None]:
import difflib

In [None]:
for mutant in MuFunctionAnalyzer(triangle):
    shape_src = mutant.pm.src
    for line in difflib.unified_diff(mutant.pm.src.split('\n'),
                                  mutant.src().split('\n'),
                                  fromfile=mutant.pm.name,
                                  tofile=mutant.name, n=3):
        print(line)
    break

We add the `diff()` method to `Mutant` so that it can be called directly.

In [None]:
class Mutant(Mutant):
    def diff(self):
        return '\n'.join(difflib.unified_diff(self.pm.src.split('\n'),
                                  self.src().split('\n'),
                                  fromfile='original',
                                  tofile='mutant',
                                         n=3))

We are now ready to implement the actual evaluation. We define our mutant as a context manager that verifies that all assertions given succeeds.

In [None]:
class Mutant(Mutant):
    def __enter__(self):
        if self.log:
            print('->\t%s' % self.name)
        c = compile(self.src(), '<mutant>', 'exec')
        eval(c, globals())

    def __exit__(self, exc_type, exc_value, traceback):
        if self.log:
            print('<-\t%s' % self.name)
        if exc_type is not None:
            self.detected = True
            if self.log:
                print("Detected %s" % self.name, exc_type, exc_value)
        globals()[self.pm.name] = self.pm.fn
        if self.log:
            print()
        return True

The `finish()` simply invokes the method on the mutant, checks if the mutant was discovered, and returns the result.

In [None]:
from ExpectError import ExpectTimeout

In [None]:
class MuFunctionAnalyzer(MuFunctionAnalyzer):
    def finish(self):
        self.un_detected = {mutant for mutant in self.mutants if not mutant.detected}

The mutation score is computed by `score()`.

In [None]:
class MuFunctionAnalyzer(MuFunctionAnalyzer):
    def score(self):
        return (self.nmutations - len(self.un_detected))/self.nmutations

Here is how we use our framework.

In [None]:
import sys

In [None]:
for mutant in MuFunctionAnalyzer(triangle):
    with mutant:
        assert triangle(1,1,1) == 'Equilateral', "Equal Check1"
        assert triangle(1,0,1) != 'Equilateral', "Equal Check2"
        assert triangle(1,0,2) != 'Equilateral', "Equal Check3"
mutant.pm.score()

The `weak_oracle()` test suite resulted in only `20%` mutation score.

In [None]:
for mutant in MuFunctionAnalyzer(triangle):
    with mutant:
        weak_oracle(triangle)
mutant.pm.score()

Since we are modifying the global namespace, we do not have to refer to the function directly within the for loop of mutant.

In [None]:
def oracle():
    strong_oracle(triangle)

In [None]:
for mutant in MuFunctionAnalyzer(triangle):
    with mutant:
        oracle()
mutant.pm.score()

That is, we were able to achieve `100%` mutation score with `strong_oracle` test suite.

Here is another example, `gcd()`.

In [None]:
def gcd(a, b):
    if a<b:
        c: int = a
        a: int = b
        b: int = c

    while b != 0 :
        c: int = a
        a: int = b
        b: int = c % b
    return a

In [None]:
for mutant in MuFunctionAnalyzer(gcd, log=True):
    with mutant:
        assert gcd(1, 0) == 1, "Minimal"
        assert gcd(0, 1) == 1, "Mirror"
mutant.pm.score()

We see that our `TestGCD` test suite is able to obtain `42%` mutation score.

## Mutator for Modules and Test Suites

Consider the `triangle()` program we discussed previously. As we discussed, a simple way to produce valid mutated version of this program is to replace some of its statements by `pass`.

For demonstration purposes, we would like to proceed as though the program was in a different file. We can do that by producing a `Module` object in Python, and attaching the function to it.

In [None]:
import imp

In [None]:
def import_code(code, name):
    module = imp.new_module(name)
    exec(code, module.__dict__)
    return module

We attach the `triangle()` function to the `shape` module.

In [None]:
shape = import_code(inspect.getsource(triangle), 'shape')

We can now invoke triangle through the module `shape`.

In [None]:
shape.triangle(1,1,1)

We want to test the `triangle()` function. For that, we define a `StrongShapeTest` class as below.

In [None]:
import unittest

In [None]:
class StrongShapeTest(unittest.TestCase):

    def test_equilateral(self):
        assert shape.triangle(1,1,1) == 'Equilateral'

    def test_isosceles(self):
        assert shape.triangle(1,2,1) == 'Isosceles'
        assert shape.triangle(2,2,1) == 'Isosceles'
        assert shape.triangle(1,2,2) == 'Isosceles'

    def test_scalene(self):
        assert shape.triangle(1,2,3) == 'Scalene'

We define a helper function `suite()` that looks through a given class and identifies the test functions.

In [None]:
def suite(test_class):
    suite = unittest.TestSuite()
    for f in test_class.__dict__:
        if f.startswith('test_'):
            suite.addTest(test_class(f))
    return suite

The tests in `TestTriangle` class can be invoked with different test runners. The simplest is to directly invoke the `run()` method of the `TestCase`.

In [None]:
suite(StrongShapeTest).run(unittest.TestResult())

The `TextTestRunner` class provides ability to control the verbosity of execution. It also allows one to return on the *first* failure.

In [None]:
runner = unittest.TextTestRunner(verbosity=0, failfast=True)
runner.run(suite(StrongShapeTest))

Running the program under coverage is accomplished as follows:

In [None]:
with Coverage() as cov:
    suite(StrongShapeTest).run(unittest.TestResult())

The coverage obtained is given by:

In [None]:
cov.show_coverage(triangle)

In [None]:
class WeakShapeTest(unittest.TestCase):
    def test_equilateral(self):
        assert shape.triangle(1,1,1) == 'Equilateral'

    def test_isosceles(self):
        assert shape.triangle(1,2,1) != 'Equilateral'
        assert shape.triangle(2,2,1) != 'Equilateral'
        assert shape.triangle(1,2,2) != 'Equilateral'

    def test_scalene(self):
        assert shape.triangle(1,2,3) != 'Equilateral'

How much coverage does it obtain?

In [None]:
with Coverage() as cov:
    suite(WeakShapeTest).run(unittest.TestResult())

In [None]:
cov.show_coverage(triangle)

The `MuProgramAnalyzer` is the main class responsible for mutation analysis of the test suite. It accepts the name of the module to be tested, and its source code. It normalizes the source code given by parsing and unparsing it once. This is required to ensure that later `diff`s between the original and mutant is not derailed by differences in whitespace comments etc.

In [None]:
class MuProgramAnalyzer(MuFunctionAnalyzer):
    def __init__(self, name, src):
        self.name = name
        self.ast = ast.parse(src)
        self.src = astunparse.unparse(self.ast)
        self.changes = []
        self.mutator = self.mutator_object()
        self.nmutations = self.get_mutation_count()
        self.un_detected = set()
        
    def mutator_object(self, locations=None):
        return AdvStmtDeletionMutator(self, locations)

The `Mutator` provides the base class for implementing individual mutations. It accepts a list of locations to mutate. It assumes that the method `mutable_visit` is invoked on all nodes of interest as determined by the subclass. When the `Mutator` is invoked without a list of locations to mutate, it simply loops through all possible mutation points and retains a count in `self.count`. If it is invoked with a specific list of locations to mutate, the `mutable_visit()` method calls the `mutation_visit()` which performs the mutation on the node. Note that a single location can produce multiple mutations. (Hence the hashmap).

In [None]:
class AdvMutator(Mutator):
    def __init__(self, analyzer, mutate_locations=None):
        self.count = 0
        self.mutate_locations = [] if mutate_locations is None else mutate_locations
        self.pm = analyzer

    def mutable_visit(self, node):
        self.count += 1 # statements start at line no 1
        return self.mutation_visit(node)

The `AdvStmtDeletionMutator` simply hooks into all the statement processing visitors. It performs mutation by replacing the given statement with `pass`.

In [None]:
class AdvStmtDeletionMutator(AdvMutator, StmtDeletionMutator):
    def __init__(self, analyzer, mutate_locations=None):
        AdvMutator.__init__(self, analyzer, mutate_locations)
        
    def mutation_visit(self, node):
        index = 0 # there is only one way to delete a statement -- replace it by pass
        if not self.mutate_locations: # counting pass
            self.pm.changes.append((self.count, index))
            return self.generic_visit(node)
        else:
            # get matching changes for this pass 
            mutating_lines = set((count,idx) for (count, idx) in self.mutate_locations)
            if (self.count, index) in mutating_lines:
                return ast.Pass()
            else:
                return self.generic_visit(node)

We can obtain the number of mutations produced for `triangle()` as follows.

In [None]:
MuProgramAnalyzer('shape', inspect.getsource(triangle)).nmutations

We need a way to obtain the individual mutants. For this, we convert our `ProgramMutator` to an *iterable*.

In [None]:
class MuProgramAnalyzer(MuProgramAnalyzer):
    def __iter__(self):
        return AdvPMIterator(self)

The `AdvPMIterator`, which is the *iterator* class for `ProgramMutator` is defined as follows.

In [None]:
class AdvPMIterator:
    def __init__(self, pm):
        self.pm = pm
        self.idx = 0

The `next()` method returns the corresponding `Mutant`

In [None]:
class AdvPMIterator(AdvPMIterator):
    def __next__(self):
        i = self.idx
        if i >= len(self.pm.changes):
            raise StopIteration()
        self.idx += 1
        return AdvMutant(self.pm, [self.pm.changes[i]]) # there could be multiple changes in one mutant

The `Mutant` class contains logic for generating mutants when given the locations to mutate.

In [None]:
class AdvMutant(Mutant):
    def __init__(self, pm, locations):
        self.pm = pm
        self.i = locations
        self.name = "%s_%s" % (self.pm.name, '_'.join([str(i) for i in self.i]))
        self._src = None

Here is how it can be used:

In [None]:
shape_src = inspect.getsource(triangle)

In [None]:
for m in MuProgramAnalyzer('shape', shape_src):
    print(m.name)

The `generate_mutant()` simply calls the `mutator()` method, and passes the mutator a copy of the AST. 

In [None]:
class AdvMutant(AdvMutant):
    def generate_mutant(self, locations):
        mutant_ast = self.pm.mutator_object(locations).visit(ast.parse(self.pm.src)) # copy
        return astunparse.unparse(mutant_ast)

The `src()` method returns the mutated source.

In [None]:
class AdvMutant(AdvMutant):
    def src(self):
        if self._src is None:
            self._src = self.generate_mutant(self.i)
        return self._src

Here is how one can obtain the mutants, and visualize the difference from the original:

In [None]:
import difflib

We add the `diff()` method to `Mutant` so that it can be called directly.

In [None]:
class AdvMutant(AdvMutant):
    def diff(self):
        return '\n'.join(difflib.unified_diff(self.pm.src.split('\n'),
                                  self.src().split('\n'),
                                  fromfile='original',
                                  tofile='mutant',
                                         n=3))

In [None]:
for mutant in MuProgramAnalyzer('shape', inspect.getsource(triangle)):
    print(mutant.name)
    print(mutant.diff())
    break

We are now ready to implement the actual evaluation. For doing that, we require the ability to accept the module where the test suite is defined, and invoke the test method on it. The method `getitem` accepts the test module, fixes the import entries on the test module to correctly point to the mutant module, and passes it to the test runner `MutantTestRunner`.

In [None]:
class AdvMutant(AdvMutant):
    def __getitem__(self, test_module):
        test_module.__dict__[self.pm.name] = import_code(self.src(), self.pm.name)
        return MutantTestRunner(self, test_module)

The `MutantTestRunner` simply calls all `test_` methods on the test module, checks if the mutant was discovered, and returns the result.

In [None]:
from ExpectError import ExpectTimeout

In [None]:
class MutantTestRunner:
    def __init__(self, mutant, test_module):
        self.mutant = mutant
        self.tm = test_module
        
    def runTest(self, tc):
        suite = unittest.TestSuite()
        test_class = self.tm.__dict__[tc]
        for f in test_class.__dict__:
            if f.startswith('test_'):
                suite.addTest(test_class(f))
        runner = unittest.TextTestRunner(verbosity=0, failfast=True)
        try:
            with ExpectTimeout(1):
                res = runner.run(suite)
                if res.wasSuccessful():
                    self.mutant.pm.un_detected.add(self)
                return res
        except SyntaxError:
            print('Syntax Error (%s)' % self.mutant.name)
            return None
        raise Exception('Unhandled exception during test execution')

The mutation score is computed by `score()`.

In [None]:
class MuProgramAnalyzer(MuProgramAnalyzer):
    def score(self):
        return (self.nmutations - len(self.un_detected))/self.nmutations

Here is how we use our framework.

In [None]:
import sys

In [None]:
test_module = sys.modules[__name__]
for mutant in MuProgramAnalyzer('shape', shape_src):
    mutant[test_module].runTest('WeakShapeTest')
mutant.pm.score()

The `WeakShape` test suite resulted in only `20%` mutation score.

In [None]:
for mutant in MuProgramAnalyzer('shape', shape_src):
    mutant[test_module].runTest('StrongShapeTest')
mutant.pm.score()

On the other hand, we were able to achieve `100%` mutation score with `StrongShapeTest` test suite.

Here is another example, `gcd()`.

In [None]:
gcd_src = inspect.getsource(gcd)

In [None]:
class TestGCD(unittest.TestCase):
    def test_simple(self):
        assert cfg.gcd(1,0) == 1
        
    def test_mirror(self):
        assert cfg.gcd(0,1) == 1

In [None]:
for mutant in MuProgramAnalyzer('cfg', gcd_src):
    mutant[test_module].runTest('TestGCD')
mutant.pm.score()

We see that our `TestGCD` test suite is able to obtain `42%` mutation score.

## Dealing with Immortal (Equivalent) Mutants

\todo{Check with Marcel about equivalent mutant estimation using STADS paper}

## Dealing with Redundant Mutants

\todo{Check with Marcel about redundant mutant estimation}

## Lessons Learned

* We have learned why structural coverage is insufficient to evaluate the quality of test suites.
* We have learned how to use Mutation Analysis for evaluating test suite quality.
* We have learned the limitations of Mutation Analysis -- Immortal and Redundant mutants, and how to estimate them.

## Next Steps

* While naive fuzzing generates poor quality oracles, techniques such as [symbolic](SymbolicFuzzer.ipynb) and [concolic](ConcolicFuzzer.ipynb) can enhance the quality oracles used in fuzzing.
* [Dynamic invariants](DynamicInvariants.ipynb) can also be of great help in improving the quality of oracles.

## Background

The idea of Mutation Analysis was first introduced by Lipton et al.

## Exercises

### Exercise 1:  Arithmetic Expression Mutators

Our simple statement deletion mutation is only one of the ways in which a program could be mutated. Another category of mutants is `expression mutation` where arithmetic operators such as `{+,-,*,/}` etc are replaced for one another. For example, given an expression such as
```
x = x + 1
```
One can mutate it to
```
x = x - 1
```
and
```
x = x * 1
```
and
```
x = x / 1
```

I. Can you produce an `AdvArithmeticOperatorMutator` that produces such mutants?

II. Can you fix the function mutator such that one can write `ArithmeticOperatorMutator` also? (Why is it not possible with the current implementation?)

**Hint.** Check how many mutants can be produced per token for the function mutator.

**Solution.** None provided.

### Exercise 2: Byte Code Mutator

We have seen how to mutate the AST given the source. One of the deficiencies with this approach is that the Python bytecode is targeted by other languages too. In such cases, the source may not be readily converted to a Python AST, and it is desirable to mutate the bytecode instead. Can you implement a bytecode mutator for Python function that mutates the bytecode instead of fetching the source and then mutating it?

**Solution.** None provided.