# Tracing Executions

In this chapter, we show how to observe program state during an execution – a prerequisite for logging and interactive debugging. Thanks to the power of Python, we can do this in a few lines of code.

**Prerequisites**

* You should have read the [Introduction to Debugging](Intro_Debugging.ipynb).
* Knowing a bit of _Python_ is helpful for understanding the code examples in the book.

In [351]:
import bookutils

## Tracing Python Programs

How do debugging tools access the state of a program during execution? For _interpreted_ languages such as Python, this is a fairly simple task. If a language is interpreted, it is typically fairly easy to control execution and to inspect state – since this is what the interpreter is doing already anyway. Debuggers are then implemented in top of _hooks_ that allow to interrupt execution and access program state.

Python makes such a hook available in the function `sys.settrace()`. You invoke it with a *tracing function* that will be called at every line executed, as in

```python
sys.settrace(traceit)
```

Such a tracing function is convenient, as it simply traces _everything_. In contrast to an interactive debugger, where you have to select which aspect of the execution you're interested in, you can just print out a long trace into an *execution log*, to examine it later.

This tracing function takes the format

In [352]:
def traceit(frame, event, arg):
    ...

Here, `event` is a string telling what has happened in the program – for instance,

* `'line'` – a new line is executed
* `'call'` – a function just has been called
* `'return'` – a fucntion returns

The `frame` argument holds the current execution frame – that is, the function and its local variables:

* `frame.f_lineno` – the current line
* `frame.f_locals` – the current variables (as a Python dictionary)
* `frame.f_code` – the current code (as a Code object), with attributes such as
    * `frame.f_code.co_name` – the name of the current function

We can thus get a *trace* of the program by simply printing out these values:

In [353]:
def traceit(frame, event, arg):
    print(event, frame.f_lineno, frame.f_code.co_name, frame.f_locals)

The return value of the trace function is the function to be executed at the next event – typically, this is the function itself:

In [354]:
def traceit(frame, event, arg):
    print(event, frame.f_lineno, frame.f_code.co_name, frame.f_locals)
    return traceit

Let us try this out on the `remove_html_markup()` function introduced in the [Introduction to Debugging](Intro_Debugging.ipynb):

In [355]:
from Intro_Debugging import remove_html_markup

In [356]:
import inspect
from bookutils import print_content

In [357]:
print_content(content=inspect.getsource(remove_html_markup), filename='.py')

[34mdef[39;49;00m [32mremove_html_markup[39;49;00m(s):
    tag   = [36mFalse[39;49;00m
    quote = [36mFalse[39;49;00m
    out   = [33m"[39;49;00m[33m"[39;49;00m

    [34mfor[39;49;00m c [35min[39;49;00m s:
        [34massert[39;49;00m tag [35mor[39;49;00m [35mnot[39;49;00m quote
        
        [34mif[39;49;00m c == [33m'[39;49;00m[33m<[39;49;00m[33m'[39;49;00m [35mand[39;49;00m [35mnot[39;49;00m quote:
            tag = [36mTrue[39;49;00m
        [34melif[39;49;00m c == [33m'[39;49;00m[33m>[39;49;00m[33m'[39;49;00m [35mand[39;49;00m [35mnot[39;49;00m quote:
            tag = [36mFalse[39;49;00m
        [34melif[39;49;00m (c == [33m'[39;49;00m[33m"[39;49;00m[33m'[39;49;00m [35mor[39;49;00m c == [33m"[39;49;00m[33m'[39;49;00m[33m"[39;49;00m) [35mand[39;49;00m tag:
            quote = [35mnot[39;49;00m quote
        [34melif[39;49;00m [35mnot[39;49;00m tag:
            out = out + c
    
    [34mreturn[39;49;00

We define a variant `remove_html_markup_traced()` which turns on tracing, invokes `remove_html_markup()`, and turns tracing off again.

In [358]:
import sys

In [359]:
def remove_html_markup_traced(s):
    sys.settrace(traceit)
    ret = remove_html_markup(s)
    sys.settrace(None)
    return ret

In [360]:
remove_html_markup_traced('xyz')

call 222 remove_html_markup {'s': 'xyz'}
line 223 remove_html_markup {'s': 'xyz'}
line 224 remove_html_markup {'s': 'xyz', 'tag': False}
line 225 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False}
line 227 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': ''}
line 228 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': '', 'c': 'x'}
line 230 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': '', 'c': 'x'}
line 232 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': '', 'c': 'x'}
line 234 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': '', 'c': 'x'}
line 236 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': '', 'c': 'x'}
line 237 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': '', 'c': 'x'}
line 227 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': 'x', 'c': 'x'}
line 228 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': F

'xyz'

In this very raw format, we can see how the execution progresses through the function. The variable `c` takes one character of the input string at a time; the `out` variable accumulates them. The argument `s` and the `tag` and `quote` flags stay unchanged throughout the execution.

## A Tracer Class

Let us refine our tracing function a bit. First, it would be nice if one could actually _customize_ tracing just as needed. To this end, we introduce a `Tracer` class that does all the formatting with us, and which can be _subclassed_ to allow for different output formats.

The `traceit()` method is the same as above, and again is added set up via `sys.settrace()`. Its typical usage, however, is as follows:

```python
with Tracer():
    # Code to be traced
    ...

# Code no longer traced
...
```

When the `with` statement is encountered, the `__enter__()` method is called, which starts tracing. When the `with` block ends, the `__exit__()` method is called, and tracing is turned off. We take special care that the internal `__exit__()` method is not part of the trace, and that any other tracing function that was active before is being restored.

In [361]:
class Tracer(object):
    def __init__(self):
        self.original_trace_function = None
        pass

    def traceit(self, frame, event, arg):
        print(event, frame.f_lineno, frame.f_code.co_name, frame.f_locals)

    def _traceit(self, frame, event, arg):
        if frame.f_code.co_name == '__exit__':
            # Do not trace our own __exit__() method
            pass
        else:
            self.traceit(frame, event, arg)
        return self._traceit

    def __enter__(self):  # Begin of "with" block
        self.original_trace_function = sys.gettrace()
        sys.settrace(self._traceit)

    def __exit__(self, tp, value, traceback):   # End of "with" block
        sys.settrace(self.original_trace_function)

Here's how we use the `Tracer` class. You see that everything works as before, except that it is nicer to use:

In [362]:
def remove_html_markup_traced(s):
    with Tracer():
        ret = remove_html_markup(s)
    return ret

In [363]:
remove_html_markup_traced("abc")

call 222 remove_html_markup {'s': 'abc'}
line 223 remove_html_markup {'s': 'abc'}
line 224 remove_html_markup {'s': 'abc', 'tag': False}
line 225 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False}
line 227 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': ''}
line 228 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': '', 'c': 'a'}
line 230 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': '', 'c': 'a'}
line 232 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': '', 'c': 'a'}
line 234 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': '', 'c': 'a'}
line 236 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': '', 'c': 'a'}
line 237 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': '', 'c': 'a'}
line 227 remove_html_markup {'s': 'abc', 'tag': False, 'quote': False, 'out': 'a', 'c': 'a'}
line 228 remove_html_markup {'s': 'abc', 'tag': False, 'quote': F

'abc'

## Accessing Source Code

We cam now go and _extend_ the class with additional features. It would be nice if it could actually display the source code of the function being tracked, such that we know where we are. In Python, the function `inspect.getsource()` returns the source code of a function or module. Looking up

```python
module = inspect.getmodule(frame.f_code)
```

gives us the current module, and

```python
inspect.getsource(module)
```

gives us its source code. All we then have to do is to retrieve the current line.

To implement our extended `traceit()` method, we use a bit of a hack. The Python language requires us to define an entire class with all methods as a single, continuous unit; however, we would like to introduce one method after another.  To avoid this problem, we use a special hack: Whenever we want to introduce a new method to some class `C`, we use the construct

```python
class C(C):
    def new_method(self, args):
        pass
```

This seems to define `C` as a subclass of itself, which would make no sense – but actually, it introduces a new `C` class as a subclass of the _old_ `C` class, and then shadowing the old `C` definition.  What this gets us is a `C` class with `new_method()` as a method, which is just what we want.  (`C` objects defined earlier will retain the earlier `C` definition, though, and thus must be rebuilt.)

Using this hack, we can now redefine the `traceit()` method. Our new tracer shows the current line as it is executed.

In [364]:
import inspect

In [365]:
class Tracer(Tracer):
    def traceit(self, frame, event, arg):
        if event == 'line':
            module = inspect.getmodule(frame.f_code)
            source = inspect.getsource(module)
            current_line = source.split('\n')[frame.f_lineno - 2]
            print(frame.f_lineno, current_line)

        return traceit

In [366]:
remove_html_markup_traced("abc")

223     tag   = False
224     quote = False
225     out   = ""
227     for c in s:
228         assert tag or not quote
230         if c == '<' and not quote:
232         elif c == '>' and not quote:
234         elif (c == '"' or c == "'") and tag:
236         elif not tag:
237             out = out + c
227     for c in s:
228         assert tag or not quote
230         if c == '<' and not quote:
232         elif c == '>' and not quote:
234         elif (c == '"' or c == "'") and tag:
236         elif not tag:
237             out = out + c
227     for c in s:
228         assert tag or not quote
230         if c == '<' and not quote:
232         elif c == '>' and not quote:
234         elif (c == '"' or c == "'") and tag:
236         elif not tag:
237             out = out + c
227     for c in s:
239     return out


'abc'

## Tracing Calls and Returns

Next, we'd like to report calling and returning from functions. For the `return` event, `arg` holds the value being returned.

In [367]:
class Tracer(Tracer):
    def traceit(self, frame, event, arg):
        if event == 'call':
            print(f"Calling {frame.f_code.co_name}()")

        if event == 'line':
            module = inspect.getmodule(frame.f_code)
            source = inspect.getsource(module)
            current_line = source.split('\n')[frame.f_lineno - 2]
            print(frame.f_lineno, current_line)

        if event == 'return':
            print(f"{frame.f_code.co_name}() returns {repr(arg)}")

        return traceit

In [368]:
remove_html_markup_traced("abc")

Calling remove_html_markup()
223     tag   = False
224     quote = False
225     out   = ""
227     for c in s:
228         assert tag or not quote
230         if c == '<' and not quote:
232         elif c == '>' and not quote:
234         elif (c == '"' or c == "'") and tag:
236         elif not tag:
237             out = out + c
227     for c in s:
228         assert tag or not quote
230         if c == '<' and not quote:
232         elif c == '>' and not quote:
234         elif (c == '"' or c == "'") and tag:
236         elif not tag:
237             out = out + c
227     for c in s:
228         assert tag or not quote
230         if c == '<' and not quote:
232         elif c == '>' and not quote:
234         elif (c == '"' or c == "'") and tag:
236         elif not tag:
237             out = out + c
227     for c in s:
239     return out
remove_html_markup() returns 'abc'


'abc'

## Tracing Variables

Finally, we'd again like to report variables – but only those that have changed. To this end, we save a copy of the last reported variables in the class, reporting only the changed values.

In [369]:
class Tracer(Tracer):
    def __init__(self):
        self.last_vars = {}

    def changed_vars(self, new_vars):
        changed = {}
        for var_name in new_vars:
            if var_name not in self.last_vars or self.last_vars[var_name] != new_vars[var_name]:
                changed[var_name] = new_vars[var_name]
        self.last_vars = new_vars.copy()
        return changed

Here's how this works: If variable `a` is set to 10 (and we didn't have it so far), it is marked as changed:

In [370]:
t = Tracer()

In [371]:
t.changed_vars({'a': 10})

{'a': 10}

If another variable `b` is added, and only `b` is changed, then only `b` is marked as changed:

In [372]:
t.changed_vars({'a': 10, 'b': 25})

{'b': 25}

If both variables keep their values, nothing changes:

In [373]:
t.changed_vars({'a': 10, 'b': 25})

{}

But if new variables come along, they are listed again.

In [374]:
changes = t.changed_vars({'c': 10, 'd': 25})
changes

{'c': 10, 'd': 25}

The following expression creates a comma-separated list of variables and values:

In [375]:
", ".join([var + " = " + repr(changes[var]) for var in changes])

'c = 10, d = 25'

We can now put all of this together in our tracing function, reporting any variable changes as we see them. Note how we exploit the fact that in a call, all variables have a "new" value; and when we return from a function, we explicitly delete the "last" variables.

In [376]:
class Tracer(Tracer):
    def print_debugger_status(self, frame, event, arg):
        changes = self.changed_vars(frame.f_locals)
        changes_s = ", ".join([var + " = " + repr(changes[var]) for var in changes])

        if event == 'call':
            print("Calling " + frame.f_code.co_name + '(' + changes_s + ')')
        elif changes:
            print(' ' * 40, '#', changes_s)

        if event == 'line':
            module = inspect.getmodule(frame.f_code)
            source = inspect.getsource(module)
            current_line = source.split('\n')[frame.f_lineno - 2]
            print(repr(frame.f_lineno) + ' ' + current_line)

        if event == 'return':
            print(frame.f_code.co_name + '()' + " returns " + repr(arg))
            self.last_vars = {}  # Delete 'last' variables

    def traceit(self, frame, event, arg):
        self.print_debugger_status(frame, event, arg)

Here's the resulting trace of `remove_html_markup()` for a more complex input. You can see that the tracing output allows us to see which lines are executed as well as the variables whose value changes.

In [377]:
remove_html_markup_traced('<b>x</b>')

Calling remove_html_markup(s = '<b>x</b>')
223     tag   = False
                                         # tag = False
224     quote = False
                                         # quote = False
225     out   = ""
                                         # out = ''
227     for c in s:
                                         # c = '<'
228         assert tag or not quote
230         if c == '<' and not quote:
231             tag = True
                                         # tag = True
227     for c in s:
                                         # c = 'b'
228         assert tag or not quote
230         if c == '<' and not quote:
232         elif c == '>' and not quote:
234         elif (c == '"' or c == "'") and tag:
236         elif not tag:
227     for c in s:
                                         # c = '>'
228         assert tag or not quote
230         if c == '<' and not quote:
232         elif c == '>' and not quote:
233             tag = False
                          

'x'

As you see, even a simple function can create a long execution log. Hence, we will now explore how to focus tracing on particular _events_.

## Conditional Tracing

A log such as the above can very quickly become very messy – notably if executions take a long time, or if data structures become very complex. If one of our local variables were a list with 1,000 entries for instance, and were changed with each line, we'd be printing out the entire list with 1,000 entries for each step. 

We could still load the log into, say, a text editor or a database and then search for specific values, but this is still cumbersome. A better alternative, however, is to have our tracer only log while specific _conditions_ hold.

To this end, we introduce a class `ConditionalTracer`, which gets a list of expressions to be checked during executions. Only if these expressions hold do we list the current status. With

```python
with ConditionalTracer('c == "z"'):
    remove_html_markup(...)
```

we would obtain only the lines executed while `c` gets a value of `'z'`, and with

```python
with ConditionalTracer('quote'):
    remove_html_markup(...)
```

we would obtain only the lines executed while `quote` is True. If we have multiple conditions, we can combine them into one using `and`, `or`, or `not`.

Our `ConditionalTracer` class stores the condition in its `condition` attribute:

In [378]:
class ConditionalTracer(Tracer):
    def __init__(self, condition=None):
        if condition is None:
            condition = "False"
        self.condition = condition
        self.last_report = None
        super().__init__()

Its `traceit()` function _evaluates_ `condition` and reports the current line only if it holds. To this end, it uses the Python `eval()` function which evaluates the condition using the local variables of the program under test. If the condition gets set, we print out three dots to indicate the elapsed time.

In [379]:
class ConditionalTracer(ConditionalTracer):
    def eval_in_context(self, expr, frame):
        try:
            cond = eval(expr, None, frame.f_locals)
        except:  # Errors evaluate to `None`
            cond = None
        return cond
    
    def do_report(self, frame, event, arg):
        return self.eval_in_context(self.condition, frame)

    def traceit(self, frame, event, arg):
        report = self.do_report(frame, event, arg)
        if report != self.last_report:
            if report:
                print("...")
            self.last_report = report

        if report:
            self.print_debugger_status(frame, event, arg)

Here's an example. We see that `quote` is set only while the three characters `b`, `a`, and `r` are processed (as should be).

In [380]:
with ConditionalTracer(condition='quote'):
    remove_html_markup('<b title="bar">"foo"</b>')

...
                                         # s = '<b title="bar">"foo"</b>', tag = True, quote = True, out = '', c = '"'
227     for c in s:
                                         # c = 'b'
228         assert tag or not quote
230         if c == '<' and not quote:
232         elif c == '>' and not quote:
234         elif (c == '"' or c == "'") and tag:
236         elif not tag:
227     for c in s:
                                         # c = 'a'
228         assert tag or not quote
230         if c == '<' and not quote:
232         elif c == '>' and not quote:
234         elif (c == '"' or c == "'") and tag:
236         elif not tag:
227     for c in s:
                                         # c = 'r'
228         assert tag or not quote
230         if c == '<' and not quote:
232         elif c == '>' and not quote:
234         elif (c == '"' or c == "'") and tag:
236         elif not tag:
227     for c in s:
                                         # c = '"'
228         assert t

We can also have the log focus on particular code locations only. To this end, we add the variables `function` and `line` to our evaluation context, which can be used within our condition to refer to the current function name or line. Then, we invoke the original `eval_cond()` as above.

In [381]:
class ConditionalTracer(ConditionalTracer):
    def eval_in_context(self, expr, frame):
        frame.f_locals['function'] = frame.f_code.co_name
        frame.f_locals['line'] = frame.f_lineno
                    
        return super().eval_in_context(expr, frame)

Again, here is an example. We focus on the parts of the function where the `out` variable is being set:

In [382]:
with ConditionalTracer("function == 'remove_html_markup' and line >= 237"):
    remove_html_markup('xyz')

...
                                         # s = 'xyz', function = 'remove_html_markup', line = 237, tag = False, quote = False, out = '', c = 'x'
237             out = out + c
...
                                         # out = 'x', c = 'y'
237             out = out + c
...
                                         # out = 'xy', c = 'z'
237             out = out + c
...
                                         # line = 239, out = 'xyz'
239     return out
remove_html_markup() returns 'xyz'


Using `line` and `function` in conditions is equivalent to conventional _breakpoints_ in interactive debuggers. We will reencounter them in the next chapter.

## Watching Events

As an alternative to conditional logging, we may also be interested to exactly trace when a variable not only _has_ a particular value, but also when it _changes_ its value.

To this end, we set up an `EventTracer` class that _watches_ when some event takes place. It takes a list of expressions ("events") and evaluates them for each line; if any event changes its value, we log the status.

With

```python
with EventTracer(events=['tag', 'quote']):
    remove_html_markup(...)
```

for instance, we would get a listing of all lines where `tag` or `quote` change their value; and with

```python
with EventTracer(events=['function']):
    remove_html_markup(...)
```

we would obtain a listing of all lines where the current function changes.

Our `EventTracer` class stores the list of events in its `events` attribute:

In [383]:
class EventTracer(ConditionalTracer):
    def __init__(self, condition=None, events=[]):
        self.events = events
        self.last_event_values = {}
        super().__init__(condition=condition)

Its `traceit()` function _evaluates_ the individual events and checks if they change.

In [384]:
class EventTracer(EventTracer):
    def events_changed(self, events, frame):
        change = False
        for event in events:
            value = self.eval_in_context(event, frame)

            if (event not in self.last_event_values or 
                value != self.last_event_values[event]):
                self.last_event_values[event] = value
                change = True
                # print(f"New value for {event}: {repr(value)}")

        return change

In [385]:
class EventTracer(EventTracer):
    def do_report(self, frame, event, arg):
        return (self.eval_in_context(self.condition, frame) or
                self.events_changed(self.events, frame))

In [386]:
with EventTracer(events=['tag']):
    remove_html_markup('<b title="bar">"foo"</b>')

...
Calling remove_html_markup(s = '<b title="bar">"foo"</b>', function = 'remove_html_markup', line = 222)
...
                                         # line = 224, tag = False
224     quote = False
...
                                         # line = 227, tag = True, quote = False, out = '', c = '<'
227     for c in s:
...
                                         # tag = False, c = '>'
227     for c in s:
...
                                         # tag = True, out = '"foo"', c = '<'
227     for c in s:
...
                                         # tag = False, c = '>'
227     for c in s:


With this, we have all we need for observing what happens during execution: We can explore the entire state, and we can evaluate conditions and events we are interested in. In the next chapter, we will see how to turn these capabilities into an interactive debugger, where we can query all these things interactively.

## Efficient Tracing

## Synopsis

_For those only interested in using the code in this chapter (without wanting to know how it works), give an example.  This will be copied to the beginning of the chapter (before the first section) as text with rendered input and output._

You can use `int_fuzzer()` as:

In [390]:
print(int_fuzzer())

76.5


## Lessons Learned

* _Lesson one_
* _Lesson two_
* _Lesson three_

## Next Steps

_Link to subsequent chapters (notebooks) here, as in:_

* [use _mutations_ on existing inputs to get more valid inputs](MutationFuzzer.ipynb)
* [use _grammars_ (i.e., a specification of the input format) to get even more valid inputs](Grammars.ipynb)
* [reduce _failing inputs_ for efficient debugging](Reducer.ipynb)


## Background

_Cite relevant works in the literature and put them into context, as in:_

The idea of ensuring that each expansion in the grammar is used at least once goes back to Burkhardt \cite{Burkhardt1967}, to be later rediscovered by Paul Purdom \cite{Purdom1972}.

## Exercises

_Close the chapter with a few exercises such that people have things to do.  To make the solutions hidden (to be revealed by the user), have them start with_

```
**Solution.**
```

_Your solution can then extend up to the next title (i.e., any markdown cell starting with `#`)._

_Running `make metadata` will automatically add metadata to the cells such that the cells will be hidden by default, and can be uncovered by the user.  The button will be introduced above the solution._

### Exercise 1: _Title_

_Text of the exercise_

In [391]:
# Some code that is part of the exercise
pass

_Some more text for the exercise_

**Solution.** _Some text for the solution_

In [392]:
# Some code for the solution
2 + 2

4

_Some more text for the solution_

### Exercise 2: _Title_

_Text of the exercise_

**Solution.** _Solution for the exercise_