# Statistical Debugging

Let us correlate events with failures!

**Prerequisites**

* You should have read the [Chapter on Tracing](Tracer.ipynb).

In [None]:
import bookutils

## Synopsis
<!-- Automatically generated. Do not edit. -->

To [use the code provided in this chapter](Importing.ipynb), write

```python
>>> from debuggingbook.StatisticalDebugger import <identifier>
```

and then make use of the following features.


_For those only interested in using the code in this chapter (without wanting to know how it works), give an example.  This will be copied to the beginning of the chapter (before the first section) as text with rendered input and output._

For instance, this is what we get for `x=1`:

You can use `int_fuzzer()` as:

```python
>>> print(2 + 2)
4
```


## All these Events!


We start with a base class to collect events:

In [None]:
from Tracer import Tracer

In [None]:
class Collector(Tracer):
    """A class to record events during execution."""

    def collect(self, frame, event, arg):
        """Collecting function. To be overridden in subclasses."""
        pass

    def traceit(self, frame, event, arg):
        self.collect(frame, event, arg)

In [None]:
class Collector(Collector):
    def __init__(self):
        self._id = None

    def traceit(self, frame, event, arg):
        if self._id is None and event == "call":
            function = frame.f_code.co_name
            locals = frame.f_locals
            args = ", ".join([f"{var}={repr(locals[var])}" for var in locals])
            self._id = f"{function}({args})"

        self.collect(frame, event, arg)

    def id(self):
        return self._id

In [None]:
from Intro_Debugging import remove_html_markup

In [None]:
with Collector():
    remove_html_markup('abc')

Let's extend this to collect coverage:

In [None]:
class CoverageCollector(Collector):
    def __init__(self):
        super().__init__()
        self.coverage = set()

    def collect(self, frame, event, arg):
        self.coverage.add(frame.f_lineno)

In [None]:
class CoverageCollector(CoverageCollector):
    def events(self):
        """Return a set of predicates holding for the execution"""
        return self.coverage

In [None]:
c = CoverageCollector()
with c:
    remove_html_markup('abc')
print(c.events())

In [None]:
print(c.id())

In [None]:
import inspect

In [None]:
def list_with_coverage(function, coverage):
    source_lines, starting_line_number = \
       inspect.getsourcelines(function)

    line_number = starting_line_number
    for line in source_lines:
        marker = '*' if line_number in coverage else ' '
        print(f"{line_number:4} {marker} {line}", end='')
        line_number += 1

In [None]:
list_with_coverage(remove_html_markup, c.coverage)

The interesting part are those lines _not_ covered.

## Computing Differences

In [None]:
c = CoverageCollector()
with c:
    remove_html_markup('abc')
print(c.events())

In [None]:
class StatisticalDebugger():
    def __init__(self, collector_class):
        self.collector_class = collector_class
        self.collectors = {}

In [None]:
class StatisticalDebugger(StatisticalDebugger):
    def collect(self, name, *args):
        collector = self.collector_class(*args)
        if name not in self.collectors:
            self.collectors[name] = []
        self.collectors[name].append(collector)
        return collector

In [None]:
from IPython.display import display, Markdown, HTML

In [None]:
class StatisticalDebugger(StatisticalDebugger):
    def event_table(self, show_ids=False):
        sep = ' | '

        all_events = set()
        for name in self.collectors:
            for collector in self.collectors[name]:
                all_events.update(collector.events())

        longest_event = max(len(f"{event}") for event in all_events)

        out = ""

        # Header
        if show_ids:
            out += '| ' + ' ' * longest_event + sep
            for name in self.collectors:
                for collector in self.collectors[name]:
                    out += '`' + collector.id() + '`' + sep
            out += '\n'
        else:
            out += '| ' + ' ' * longest_event + sep
            for name in self.collectors:
                for i in range(len(self.collectors[name])):
                    out += name + sep
            out += '\n'

        out += '| ' + '-' * longest_event + sep
        for name in self.collectors:
            for i in range(len(self.collectors[name])):
                out += '-' * len(name) + sep
        out += '\n'

        # Data
        for event in all_events:
            out += f"| {repr(event).rjust(longest_event)}" + sep
            for name in self.collectors:
                for collector in self.collectors[name]:
                    out += ' ' * (len(name) - 1)
                    if event in collector.events():
                        out += "X"
                    else:
                        out += "-"
                    out += sep
            out += '\n'

        return Markdown(out)

In [None]:
class DifferenceDebugger(StatisticalDebugger):
    PASS = 'pass'
    FAIL = 'fail'

    def collect_pass(self):
        return self.collect(self.PASS)
    def collect_fail(self):
        return self.collect(self.FAIL)

    def pass_collectors(self):
        return self.collectors[self.PASS]
    def fail_collectors(self):
        return self.collectors[self.FAIL]

In [None]:
def test_debugger_html(debugger):
    with debugger.collect_pass():
        remove_html_markup('abc')
    with debugger.collect_pass():
        remove_html_markup('<b>abc</b>')
    with debugger.collect_fail():
        remove_html_markup('<b bar="foo"></b>')
    return debugger

In [None]:
debugger = test_debugger_html(DifferenceDebugger(CoverageCollector))

In [None]:
debugger.event_table()

In [None]:
pass_1_events = debugger.pass_collectors()[0].events()

In [None]:
pass_2_events = debugger.pass_collectors()[1].events()

In [None]:
in_any_pass = pass_1_events | pass_2_events
in_any_pass

In [None]:
fail_events = debugger.fail_collectors()[0].events()

In [None]:
only_in_fail = fail_events - in_any_pass
only_in_fail

In [None]:
list_with_coverage(remove_html_markup, only_in_fail)

In [None]:
class DifferenceDebugger(DifferenceDebugger):
    def all_events(self, category=None):
        in_any = set()
        if category:
            for collector in self.collectors[category]:
                in_any.update(collector.events())
        else:
            for category in self.collectors:
                for collector in self.collectors[category]:
                    in_any.update(collector.events())
        return in_any

    def all_fail(self):
        return self.all_events(self.FAIL)

    def all_pass(self):
        return self.all_events(self.PASS)

    def only_fail(self):
        return self.all_fail() - self.all_pass()

    def only_pass(self):
        return self.all_pass() - self.all_fail()

In [None]:
debugger = test_debugger_html(DifferenceDebugger(CoverageCollector))

In [None]:
debugger.all_events()

In [None]:
debugger.only_fail()

In [None]:
debugger.only_pass()

## Visualizing Differences

### Discrete Spectrum

In [None]:
class DiscreteSpectrumDebugger(DifferenceDebugger):
    def color(self, line_number):
        passing = self.all_events(self.PASS)
        failing = self.all_events(self.FAIL)

        if line_number in passing and line_number in failing:
            return 'lightyellow'
        elif line_number in failing:
            return 'mistyrose'
        elif line_number in passing:
            return 'honeydew'
        else:
            return None

In [None]:
class DiscreteSpectrumDebugger(DiscreteSpectrumDebugger):
    def list_with_spectrum(self, function):
        source_lines, starting_line_number = \
           inspect.getsourcelines(function)

        line_number = starting_line_number
        out = ""
        for line in source_lines:
            if line.strip() == '':
                line = '&nbsp;'

            line = str(line_number).rjust(4) + ' ' + line

            color = self.color(line_number)
            if color:
                line = f'<pre style="background-color:{color}">' \
                        f'{line.rstrip()}</pre>'
            else:
                line = f'<pre>{line}</pre>'

            out += line
            line_number += 1

        return HTML(out)

In [None]:
debugger = test_debugger_html(DiscreteSpectrumDebugger(CoverageCollector))

In [None]:
debugger.list_with_spectrum(remove_html_markup)

### Continuous Spectrum

We introduce the Tarantula method for highlighting differences. The color is defined as follows:

$$\textit{color}(\textit{line}) = \textit{low color(red)} + \frac{\%\textit{passed}(\textit{line})}{\%\textit{passed}(\textit{line}) + \%\textit{failed}(\textit{line})} \times \textit{color range}$$

In [None]:
class ContinuousSpectrumDebugger(DiscreteSpectrumDebugger):
    def event_fraction(self, event, category):
        all_runs = self.collectors[category]
        runs_with_event = set(collector for collector in all_runs 
                              if event in collector.events())
        fraction = len(runs_with_event) / len(all_runs)
        # print(f"%{category}({event}) = {fraction}")
        return fraction

    def passed(self, line_number):
        return self.event_fraction(line_number, self.PASS)

    def failed(self, line_number):
        return self.event_fraction(line_number, self.FAIL)

    def hue(self, line_number):
        passed = self.passed(line_number)
        failed = self.failed(line_number)
        if passed + failed > 0:
            return passed / (passed + failed)
        else:
            return None

In [None]:
debugger = test_debugger_html(ContinuousSpectrumDebugger(CoverageCollector))

In [None]:
for line in debugger.only_fail():
    print(line, debugger.hue(line))

In [None]:
for line in debugger.only_pass():
    print(line, debugger.hue(line))

The brightness is defined as follows:

$$\textit{bright}(line) = \max(\%\textit{passed}(\textit{line}), \%\textit{failed}(\textit{line}))$$

In [None]:
class ContinuousSpectrumDebugger(ContinuousSpectrumDebugger):
    def bright(self, line):
        return max(self.passed(line), self.failed(line))

In [None]:
debugger = test_debugger_html(ContinuousSpectrumDebugger(CoverageCollector))
for line in debugger.only_fail():
    print(line, debugger.bright(line))

In [None]:
class ContinuousSpectrumDebugger(ContinuousSpectrumDebugger):
    def color(self, line):
        hue = debugger.hue(line)
        if hue is None:
            return None
        saturation = debugger.bright(line)

        # HSL color values are specified with: 
        # hsl(hue, saturation, lightness).
        return f"hsl({hue * 120}, {saturation * 100}%, 80%)"

In [None]:
debugger = test_debugger_html(ContinuousSpectrumDebugger(CoverageCollector))

In [None]:
for line in debugger.only_fail():
    print(line, debugger.color(line))

In [None]:
for line in debugger.only_pass():
    print(line, debugger.color(line))

In [None]:
debugger.list_with_spectrum(remove_html_markup)

Here's another example (right from the Tarantula paper source):

In [None]:
def middle(x, y, z):
    if y < z:
        if x < y:
            return y
        elif x < z:
            return y
    else:
        if x > y:
            return y
        elif x > z:
            return x
    return z

In [None]:
def test_debugger_middle(debugger):
    with debugger.collect_pass():
        middle(3, 3, 5)
    with debugger.collect_pass():
        middle(1, 2, 3)
    with debugger.collect_pass():
        middle(3, 2, 1)
    with debugger.collect_pass():
        middle(5, 5, 5)
    with debugger.collect_pass():
        middle(5, 3, 4)
    with debugger.collect_fail():
        middle(2, 1, 3)
    return debugger

Note that in order to collect data from multiple function invocations, you need to have a separate `with` clause for every invocation. The following will _not_ work correctly:

```python
    with debugger.collect_pass():
        middle(3, 3, 5)
        middle(1, 2, 3)
        ...
```

In [None]:
debugger = test_debugger_middle(ContinuousSpectrumDebugger(CoverageCollector))

In [None]:
debugger.event_table()

In [None]:
debugger.list_with_spectrum(middle)

## Ranking Lines by Suspiciousness

### The Tarantula Metric

### The Ochiai Metric

### How Effective is Ranking?

## Other Events besides Coverage

Our framework allows for tracking arbitrary events, not just coverage.

In [None]:
class ValueCollector(Collector):
    def __init__(self):
        super().__init__()
        self.vars = set()

    def collect(self, frame, event, arg):
        local_vars = frame.f_locals
        for var in local_vars:
            value = local_vars[var]
            self.vars.add((var, value))

    def events(self):
        return self.vars

In [None]:
debugger = test_debugger_html(DifferenceDebugger(ValueCollector))
debugger.event_table()

In [None]:
debugger.only_fail()

In [None]:
debugger = test_debugger_middle(DifferenceDebugger(ValueCollector))
debugger.event_table()

In [None]:
debugger.only_fail()

### Training a Classifier

In [None]:
from sklearn import tree

In [None]:
class ClassifyingDebugger(DifferenceDebugger):
    PASS_VALUE = +1
    FAIL_VALUE = -1

    def samples(self):
        samples = {}
        for collector in self.pass_collectors():
            samples[collector.id()] = self.PASS_VALUE
        for collector in debugger.fail_collectors():
            samples[collector.id()] = self.FAIL_VALUE
        return samples

In [None]:
debugger = test_debugger_middle(ClassifyingDebugger(CoverageCollector))
debugger.samples()

In [None]:
class ClassifyingDebugger(ClassifyingDebugger):
    def features(self):
        features = {}
        for collector in debugger.pass_collectors():
            features[collector.id()] = collector.events()
        for collector in debugger.fail_collectors():
            features[collector.id()] = collector.events()
        return features

In [None]:
debugger = test_debugger_middle(ClassifyingDebugger(CoverageCollector))
debugger.features()

In [None]:
class ClassifyingDebugger(ClassifyingDebugger):
    def feature_names(self):
        return [repr(feature) for feature in self.all_events()]

In [None]:
debugger = test_debugger_middle(ClassifyingDebugger(CoverageCollector))
debugger.feature_names()

In [None]:
class ClassifyingDebugger(ClassifyingDebugger):
    def shape(self, sample):
        x = []
        features = self.features()
        for f in self.all_events():
            if f in features[sample]:
                x += [+1]
            else:
                x += [-1]
        return x

In [None]:
debugger = test_debugger_middle(ClassifyingDebugger(CoverageCollector))
debugger.shape('middle(z=5, y=3, x=3)')

In [None]:
class ClassifyingDebugger(ClassifyingDebugger):
    def X(self):
        X = []
        samples = self.samples()
        for key in samples:
            X += [self.shape(key)]
        return X

In [None]:
debugger = test_debugger_middle(ClassifyingDebugger(CoverageCollector))
debugger.X()

In [None]:
class ClassifyingDebugger(ClassifyingDebugger):
    def Y(self):
        Y = []
        samples = self.samples()
        for key in samples:
            Y += [samples[key]]
        return Y

In [None]:
debugger = test_debugger_middle(ClassifyingDebugger(CoverageCollector))
debugger.Y()

In [None]:
class ClassifyingDebugger(ClassifyingDebugger):
    def classifier(self):
        classifier = tree.DecisionTreeClassifier()
        classifier = classifier.fit(self.X(), self.Y())
        return classifier

In [None]:
import graphviz

In [None]:
class ClassifyingDebugger(ClassifyingDebugger):
    def show_classifier(self, classifier):
        dot_data = tree.export_graphviz(classifier, out_file=None, 
                         filled=False, rounded=True,
                         feature_names=self.feature_names(),
                                class_names=["fail", "pass"],
                                impurity=False,
                         special_characters=True)
        dot_data = dot_data.replace('&le; 0.0', ': no')
        dot_data = dot_data.replace('&ge; 0.0', ': yes')

        return graphviz.Source(dot_data)

This is the tree we get.  A decision like `* <= 0` means that `*` is not part of the input.

In [None]:
debugger = test_debugger_middle(ClassifyingDebugger(CoverageCollector))
classifier = debugger.classifier()
debugger.show_classifier(classifier)

In [None]:
class ClassifyingDebugger(ClassifyingDebugger):
    def predict(self, classifier, sample):
        return classifier.predict([self.shape(sample)])

In [None]:
debugger = test_debugger_middle(ClassifyingDebugger(CoverageCollector))
# debugger.predict(classifier, set(166))

## Synopsis

_For those only interested in using the code in this chapter (without wanting to know how it works), give an example.  This will be copied to the beginning of the chapter (before the first section) as text with rendered input and output._

For instance, this is what we get for `x=1`:

You can use `int_fuzzer()` as:

In [None]:
print(2 + 2)

## Lessons Learned

* _Lesson one_
* _Lesson two_
* _Lesson three_

## Next Steps

_Link to subsequent chapters (notebooks) here, as in:_

* [use _mutations_ on existing inputs to get more valid inputs](MutationFuzzer.ipynb)
* [use _grammars_ (i.e., a specification of the input format) to get even more valid inputs](Grammars.ipynb)
* [reduce _failing inputs_ for efficient debugging](Reducer.ipynb)


## Background

_Cite relevant works in the literature and put them into context, as in:_

The idea of ensuring that each expansion in the grammar is used at least once goes back to Burkhardt \cite{Burkhardt1967}, to be later rediscovered by Paul Purdom \cite{Purdom1972}.

## Exercises

_Close the chapter with a few exercises such that people have things to do.  To make the solutions hidden (to be revealed by the user), have them start with_

```
**Solution.**
```

_Your solution can then extend up to the next title (i.e., any markdown cell starting with `#`)._

_Running `make metadata` will automatically add metadata to the cells such that the cells will be hidden by default, and can be uncovered by the user.  The button will be introduced above the solution._

### Exercise 1: _Title_

_Text of the exercise_

In [None]:
# Some code that is part of the exercise
pass

_Some more text for the exercise_

**Solution.** _Some text for the solution_

In [None]:
# Some code for the solution
2 + 2

_Some more text for the solution_

### Exercise 2: _Title_

_Text of the exercise_

**Solution.** _Solution for the exercise_