# Correlating Failures

Let us correlate events with failures!

**Prerequisites**

* You should have read the [Chapter on Tracing](Tracer.ipynb).

In [None]:
import bookutils

## Synopsis

<!-- Automatically generated. Do not edit. -->



_For those only interested in using the code in this chapter (without wanting to know how it works), give an example.  This will be copied to the beginning of the chapter (before the first section) as text with rendered input and output._

You can use `int_fuzzer()` as:

```python
print(int_fuzzer())
```
```python
=> 76.5

```


## All these Events!


We start with a base class to collect events:

In [None]:
from Tracer import Tracer

In [None]:
class Collector(Tracer):
    """A class to record events during execution."""

    def collect(self, frame, event, arg):
        """Collecting function. To be overridden in subclasses."""
        pass

    def traceit(self, frame, event, arg):
        self.collect(frame, event, arg)

In [None]:
from Intro_Debugging import remove_html_markup

In [None]:
with Collector():
    remove_html_markup('abc')

Let's extend this to collect coverage:

In [None]:
class CoverageCollector(Collector):
    def __init__(self):
        super().__init__()
        self.coverage = set()

    def collect(self, frame, event, arg):
        self.coverage.add(frame.f_lineno)

In [None]:
class CoverageCollector(CoverageCollector):
    def events(self):
        """Return a set of predicates holding for the execution"""
        return self.coverage

In [None]:
c = CoverageCollector()
with c:
    remove_html_markup('abc')
print(c.events())

In [None]:
import inspect

In [None]:
def show_covered_lines(function, coverage):
    source_lines, starting_line_number = \
       inspect.getsourcelines(function)

    line_number = starting_line_number
    for line in source_lines:
        marker = '*' if line_number + 1 in coverage else ' '
        print(marker, line, end='')
        line_number = line_number + 1

In [None]:
show_covered_lines(remove_html_markup, c.coverage)

The interesting part are those lines _not_ covered.

## Event Differences

In [None]:
c = CoverageCollector()
with c:
    remove_html_markup('abc')
print(c.events())

In [None]:
class StatisticalDebugger():
    def __init__(self, collector_class):
        self.collector_class = collector_class
        self.collectors = {}

In [None]:
class StatisticalDebugger(StatisticalDebugger):
    def collect(self, name, *args):
        collector = self.collector_class(*args)
        if name not in self.collectors:
            self.collectors[name] = []
        self.collectors[name].append(collector)
        return collector

In [None]:
class StatisticalDebugger(StatisticalDebugger):
    def print_events(self):
        all_events = set()
        for name in self.collectors:
            for collector in self.collectors[name]:
                all_events.update(collector.events())

        longest_event = max(len(f"{event}") for event in all_events)

        print(' ' * longest_event, end=" ")
        for name in self.collectors:
            for i in range(len(self.collectors[name])):
                print(name, end=" ")
        print()

        for event in all_events:
            print(f"{repr(event).rjust(longest_event)}", end=" ")
            for name in self.collectors:
                for collector in self.collectors[name]:
                    print(' ' * (len(name) - 1), end="")
                    if event in collector.events():
                        print("X", end="")
                    else:
                        print("-", end="")
                    print(' ', end="")
            print()

In [None]:
debugger = StatisticalDebugger(CoverageCollector)
with debugger.collect('pass'):
    remove_html_markup('abc')
with debugger.collect('pass'):
    remove_html_markup('<b>abc</b>')
with debugger.collect('fail'):
    remove_html_markup('<b bar="foo">abc</b>')

In [None]:
debugger.print_events()

In [None]:
pass_1_events = debugger.collectors['pass'][0].events()

In [None]:
pass_2_events = debugger.collectors['pass'][1].events()

In [None]:
in_any_pass = pass_1_events | pass_2_events
in_any_pass

In [None]:
fail_events = debugger.collectors['fail'][0].events()

In [None]:
only_in_fail = fail_events - in_any_pass
only_in_fail

In [None]:
show_covered_lines(remove_html_markup, only_in_fail)

In [None]:
class DifferenceDebugger(StatisticalDebugger):
    PASS = 'pass'
    FAIL = 'fail'
    
    def collect_pass(self):
        return self.collect(self.PASS)
    def collect_fail(self):
        return self.collect(self.FAIL)
    
    def pass_collectors(self):
        return self.collectors[self.PASS]
    def fail_collectors(self):
        return self.collectors[self.FAIL]
    
    def suspicious(self):
        in_any_pass = set()
        for collector in self.pass_collectors():
            in_any_pass.update(collector.events())
            
        in_any_fail = set()
        for collector in self.fail_collectors():
            in_any_fail.update(collector.events())
            
        return in_any_fail - in_any_pass

In [None]:
debugger = DifferenceDebugger(CoverageCollector)
with debugger.collect_pass():
    remove_html_markup('abc')
with debugger.collect_pass():
    remove_html_markup('<b>abc</b>')
with debugger.collect_fail():
    remove_html_markup('<b bar="foo">abc</b>')

In [None]:
debugger.suspicious()

## Highlighting Differences

We show the Tarantula method for highlighting differences

In [None]:
def TarantulaDebugger(StatisticalDebugger):
    

## Synopsis

_For those only interested in using the code in this chapter (without wanting to know how it works), give an example.  This will be copied to the beginning of the chapter (before the first section) as text with rendered input and output._

You can use `int_fuzzer()` as:

In [None]:
print(2 + 2)

## Lessons Learned

* _Lesson one_
* _Lesson two_
* _Lesson three_

## Next Steps

_Link to subsequent chapters (notebooks) here, as in:_

* [use _mutations_ on existing inputs to get more valid inputs](MutationFuzzer.ipynb)
* [use _grammars_ (i.e., a specification of the input format) to get even more valid inputs](Grammars.ipynb)
* [reduce _failing inputs_ for efficient debugging](Reducer.ipynb)


## Background

_Cite relevant works in the literature and put them into context, as in:_

The idea of ensuring that each expansion in the grammar is used at least once goes back to Burkhardt \cite{Burkhardt1967}, to be later rediscovered by Paul Purdom \cite{Purdom1972}.

## Exercises

_Close the chapter with a few exercises such that people have things to do.  To make the solutions hidden (to be revealed by the user), have them start with_

```
**Solution.**
```

_Your solution can then extend up to the next title (i.e., any markdown cell starting with `#`)._

_Running `make metadata` will automatically add metadata to the cells such that the cells will be hidden by default, and can be uncovered by the user.  The button will be introduced above the solution._

### Exercise 1: _Title_

_Text of the exercise_

In [None]:
# Some code that is part of the exercise
pass

_Some more text for the exercise_

**Solution.** _Some text for the solution_

In [None]:
# Some code for the solution
2 + 2

_Some more text for the solution_

### Exercise 2: _Title_

_Text of the exercise_

**Solution.** _Solution for the exercise_