# How Debuggers Work

In this chapter, we show how _interactive debuggers_ work – tools that allow you to observe the program state during an execution.  Thanks to the power of Python, we can even build our own debugger in a few lines of code.

**Prerequisites**

* You should have read the [Introduction to Debugging](Intro_Debugging.ipynb).
* Knowing a bit of _Python_ is helpful for understanding the code examples in the book.

In [1059]:
import bookutils

## Debugger Features

_Interactive Debuggers_ (or short *debuggers*) are tools that allow you to observe program executions. A debugger typically offers the following features:

* _Run_ the program
* Define a condition under which the execution should _stop_ and hand over control to the debugger. Conditions include
    * a particular location is reached
    * a particular variable takes a particular value
    * or some other condition of choice.
* When the program stops, you can _observe_ the current state, including
    * the current location
    * variables and their values
    * the current function and its callers
* When the program stops, you can _step_ through program execution, having it stop at the next instruction again.
* Finally, you can also _resume_ execution to the next stop.

These commands typically are used in a loop. First, you identify the location(s) you want to inspect, and tell the debugger to stop execution once one of these location(s) is reached. Then you have the debugger run the program. When it stops at the given location, you inspect the state (and check whether things are as expected). You can then step through the program or define new stop conditions and resume execution.

This functionality can come as a _command-line interface_, typing commands at a prompt, or as a _graphical user interface_, selecting commands from the screen. Debuggers can come as standalone tools, or be integrated into a programming environment of choice.

## Tracing Executions

We will first explore how debuggers work are implemented for _interpreted_ languages such as Python. If a language is interpreted, it is typically fairly easy to control execution and to inspect state – since this is what the interpreter is doing already anyway. Debuggers are then implemented in top of _hooks_ that allow to interrupt execution and access program state.

Python makes such a hook available in the function `sys.settrace()`. You invoke it with a *tracing function* that will be called at every line executed, as in

```python
sys.settrace(traceit)
```

Such a tracing function is convenient, as it simply traces _everything_. In contrast to an interactive debugger, where you have to select which aspect of the execution you're interested in, you can just print out a long trace into an *execution log*, to examine it later.

This tracing function takes the format

In [1060]:
def traceit(frame, event, arg):
    ...

Here, `event` is a string telling what has happened in the program – for instance,

* `'line'` – a new line is executed
* `'call'` – a function just has been called
* `'return'` – a fucntion returns

The `frame` argument holds the current execution frame – that is, the function and its local variables:

* `frame.f_lineno` – the current line
* `frame.f_locals` – the current variables (as a Python dictionary)
* `frame.f_code` – the current code (as a Code object), with attributes such as
    * `frame.f_code.co_name` – the name of the current function

We can thus get a *trace* of the program by simply printing out these values:

In [1061]:
def traceit(frame, event, arg):
    print(event, frame.f_lineno, frame.f_code.co_name, frame.f_locals)

The return value of the trace function is the function to be executed at the next event – typically, this is the function itself:

In [1062]:
def traceit(frame, event, arg):
    print(event, frame.f_lineno, frame.f_code.co_name, frame.f_locals)
    return traceit

Let us try this out on the `remove_html_markup()` function introduced in the [Introduction to Debugging](Intro_Debugging.ipynb):

In [1063]:
from Intro_Debugging import remove_html_markup

In [1064]:
import inspect
from bookutils import print_content

In [1065]:
print_content(content=inspect.getsource(remove_html_markup), filename='.py')

[34mdef[39;49;00m [32mremove_html_markup[39;49;00m(s):
    tag   = [36mFalse[39;49;00m
    quote = [36mFalse[39;49;00m
    out   = [33m"[39;49;00m[33m"[39;49;00m

    [34mfor[39;49;00m c [35min[39;49;00m s:
        [34massert[39;49;00m tag [35mor[39;49;00m [35mnot[39;49;00m quote
        
        [34mif[39;49;00m c == [33m'[39;49;00m[33m<[39;49;00m[33m'[39;49;00m [35mand[39;49;00m [35mnot[39;49;00m quote:
            tag = [36mTrue[39;49;00m
        [34melif[39;49;00m c == [33m'[39;49;00m[33m>[39;49;00m[33m'[39;49;00m [35mand[39;49;00m [35mnot[39;49;00m quote:
            tag = [36mFalse[39;49;00m
        [34melif[39;49;00m (c == [33m'[39;49;00m[33m"[39;49;00m[33m'[39;49;00m [35mor[39;49;00m c == [33m"[39;49;00m[33m'[39;49;00m[33m"[39;49;00m) [35mand[39;49;00m tag:
            quote = [35mnot[39;49;00m quote
        [34melif[39;49;00m [35mnot[39;49;00m tag:
            out = out + c
    
    [34mreturn[39;49;00

We define a variant `remove_html_markup_traced()` which turns on tracing, invokes `remove_html_markup()`, and turns tracing off again.

In [1066]:
import sys

In [1067]:
def remove_html_markup_traced(s):
    sys.settrace(traceit)
    ret = remove_html_markup(s)
    sys.settrace(None)
    return ret

In [1068]:
remove_html_markup_traced('xyz')

call 222 remove_html_markup {'s': 'xyz'}
line 223 remove_html_markup {'s': 'xyz'}
line 224 remove_html_markup {'s': 'xyz', 'tag': False}
line 225 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False}
line 227 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': ''}
line 228 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': '', 'c': 'x'}
line 230 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': '', 'c': 'x'}
line 232 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': '', 'c': 'x'}
line 234 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': '', 'c': 'x'}
line 236 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': '', 'c': 'x'}
line 237 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': '', 'c': 'x'}
line 227 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': False, 'out': 'x', 'c': 'x'}
line 228 remove_html_markup {'s': 'xyz', 'tag': False, 'quote': F

'xyz'

In this very raw format, we can see how the execution progresses through the function. The variable `c` takes one character of the input string at a time; the `out` variable accumulates them. The argument `s` and the `tag` and `quote` flags stay unchanged throughout the execution.

Let us refine our tracing function a bit. First, it would be nice if it could actually display the source code of the function being tracked, such that we know where we are. In Python, the function `inspect.getsource()` returns the source code of a function or module. Looking up

```python
module = inspect.getmodule(frame.f_code)
```

gives us the current module, and

```python
inspect.getsource(module)
```

gives us its source code. All we then have to do is to retrieve the current line.

In [1069]:
import inspect

In [1070]:
def traceit(frame, event, arg):
    if event == 'line':
        module = inspect.getmodule(frame.f_code)
        source = inspect.getsource(module)
        current_line = source.split('\n')[frame.f_lineno - 2]
        print(frame.f_lineno, current_line)

    return traceit

Next, we'd like to report calling and returning from functions. For the `return` event, `arg` holds the value being returned.

In [1071]:
def traceit(frame, event, arg):
    if event == 'call':
        print("Calling", frame.f_code.co_name + '()')

    if event == 'line':
        module = inspect.getmodule(frame.f_code)
        source = inspect.getsource(module)
        current_line = source.split('\n')[frame.f_lineno - 2]
        print(frame.f_lineno, current_line)

    if event == 'return':
        print(frame.f_code.co_name + '()', "returns", repr(arg))

    return traceit

Finally, we'd like to report only those variables that have changed. To this end, we save a copy of the last reported variables in a global variable, reporting only the changed values.

In [1072]:
last_vars = {}

In [1073]:
def changed_vars(new_vars):
    changed = {}
    global last_vars
    for var_name in new_vars:
        if var_name not in last_vars or last_vars[var_name] != new_vars[var_name]:
            changed[var_name] = new_vars[var_name]
    last_vars = new_vars.copy()
    return changed

Here's how this works: If variable `a` is set to 10 (and we didn't have it so far), it is marked as changed:

In [1074]:
changed_vars({'a': 10})

{'a': 10}

If another variable `b` is added, and only `b` is changed, then only `b` is marked as changed:

In [1075]:
changed_vars({'a': 10, 'b': 25})

{'b': 25}

If both variables keep their values, nothing changes:

In [1076]:
changed_vars({'a': 10, 'b': 25})

{}

But if new variables come along, they are listed again.

In [1077]:
changes = changed_vars({'c': 10, 'd': 25})
changes

{'c': 10, 'd': 25}

The following expression creates a comma-separated list of variables and values:

In [1078]:
", ".join([var + " = " + repr(changes[var]) for var in changes])

'c = 10, d = 25'

We can now put all of this together in our tracing function, reporting any variable changes as we see them. Note how we exploit the fact that in a call, all variables have a "new" value; and when we return from a function, we explicitly delete the "last" variables.

In [1079]:
def print_debugger_status(frame, event, arg):
    changes = changed_vars(frame.f_locals)
    changes_s = ", ".join([var + " = " + repr(changes[var]) for var in changes])

    if event == 'call':
        print("Calling " + frame.f_code.co_name + '(' + changes_s + ')')
    elif changes:
        print(' ' * 40, '#', changes_s)
  
    if event == 'line':
        module = inspect.getmodule(frame.f_code)
        source = inspect.getsource(module)
        current_line = source.split('\n')[frame.f_lineno - 2]
        print(repr(frame.f_lineno) + ' ' + current_line)

    if event == 'return':
        print(frame.f_code.co_name + '()' + " returns " + repr(arg))
        global last_vars
        last_vars = {}  # Delete 'last' variables

In [1080]:
def traceit(frame, event, arg):
    print_debugger_status(frame, event, arg)
    return traceit

Here's the resulting trace of `remove_html_markup()` for a more complex input. You can see that the tracing output allows us to see which lines are executed as well as the variables whose value changes.

In [1081]:
remove_html_markup_traced('<b>x</b>')

Calling remove_html_markup(s = '<b>x</b>')
223     tag   = False
                                         # tag = False
224     quote = False
                                         # quote = False
225     out   = ""
                                         # out = ''
227     for c in s:
                                         # c = '<'
228         assert tag or not quote
230         if c == '<' and not quote:
231             tag = True
                                         # tag = True
227     for c in s:
                                         # c = 'b'
228         assert tag or not quote
230         if c == '<' and not quote:
232         elif c == '>' and not quote:
234         elif (c == '"' or c == "'") and tag:
236         elif not tag:
227     for c in s:
                                         # c = '>'
228         assert tag or not quote
230         if c == '<' and not quote:
232         elif c == '>' and not quote:
233             tag = False
                          

'x'

As you see, even a simple function can create a long execution log. Hence, we will now explore how to make this more selective and interactive.

## Debugger Interaction

The key idea of an _interactive_ debugger is to set up the tracing function such that it actually _asks_ what to do next, prompting you to enter a _command_. For the sake of simplicity, we collect such a command from a command line, using the Python `input()` function. The following code prompts you to enter a command:

In [1082]:
def remove_html_markup_debugged(s):
    sys.settrace(debugit)
    ret = remove_html_markup(s)
    sys.settrace(None)
    return ret

In [1083]:
INPUTS = []

In [1084]:
from bookutils import HTML

In [1085]:
import readline

In [1086]:
def my_input(prompt):
    given_input = None
    try:
        global INPUTS
        given_input = INPUTS[0]
        INPUTS = INPUTS[1:]
    except:
        pass
    
    if given_input:
        display(HTML(f"<pre>{prompt}{given_input}</pre>"))
        return given_input
    
    return input(prompt)

In [1087]:
stepping = True
breakpoints = set()

In [1088]:
def debugit(frame, event, arg):
    if stepping or frame.f_lineno in breakpoints:
        print_debugger_status(frame, event, arg)
        status = False
    
        interact = True
        while interact:
            command = my_input("(debugger) ")
            interact = debug(command, frame)

    return debugit

In [1089]:
def debug(command, frame):
    global stepping
    global breakpoints
    vars = frame.f_locals
    
    if command.find(' ') > 0:
        arg = command[command.find(' ') + 1:]
    else:
        arg = None

    if command.startswith('s'):     # step
        stepping = True
        return False

    if command.startswith('c'):   # continue
        stepping = False
        return False

    if command.startswith('l'):   # list
        print_debugger_status(frame, 'line', None)
        return True
    
    if command.startswith('p'):   # print
        if arg is None:
            print("\n".join([f"{var} = {repr(vars[var])}" for var in vars]))
            return True
        
        try:
            print(f"{arg} = {repr(eval(arg, globals(), vars))}")
        except Exception as err:
            print(f"{err.__class__.__name__}: {err}")
        return True

    if command.startswith('b'):   # break
        if arg:
            breakpoints.add(int(arg))
        print("Breakpints:", breakpoints)
        return True
    
    if command.startswith('q'):   # quit
        breakpoints = []
        stepping = False
        return False
    
    if command.startswith('#'):   # comment
        return True

    print("No such command:", repr(command))
    return True

In [1090]:
remove_html_markup_traced("abc")

Calling remove_html_markup(s = 'abc')
223     tag   = False
                                         # tag = False
224     quote = False
                                         # quote = False
225     out   = ""
                                         # out = ''
227     for c in s:
                                         # c = 'a'
228         assert tag or not quote
230         if c == '<' and not quote:
232         elif c == '>' and not quote:
234         elif (c == '"' or c == "'") and tag:
236         elif not tag:
237             out = out + c
                                         # out = 'a'
227     for c in s:
                                         # c = 'b'
228         assert tag or not quote
230         if c == '<' and not quote:
232         elif c == '>' and not quote:
234         elif (c == '"' or c == "'") and tag:
236         elif not tag:
237             out = out + c
                                         # out = 'ab'
227     for c in s:
                        

'abc'

In [1091]:
class Debugger:
    def __init__(self):
        self.stepping = True
        self.breakpoints = False
        self.interact = True

    def step_command(self, arg=""):
        """Execute up to the next line"""
        self.stepping = True

    def continue_command(self, arg=""):
        """Resume execution"""
        self.stepping = False
        self.interact = False

    def list_command(self, arg=""):
        """Show current line"""
        print_debugger_status(frame, 'line', None)
    
    def print_command(self, arg=""):
        """Print an expression. If no expression is given, print all variables"""
        if arg is None:
            print("\n".join([f"{var} = {repr(vars[var])}" for var in vars]))
        else:
            try:
                print(f"{arg} = {repr(eval(arg, globals(), vars))}")
            except Exception as err:
                print(f"{err.__class__.__name__}: {err}")

    def break_command(self, arg=""):
        """Set a breakoint in given line. If no line is given, print all breakpoints"""
        if arg:
            breakpoints.add(int(arg))
        print("Breakpoints:", breakpoints)

    def clear_command(self, arg=""):
        """Clear a breakoint in given line. If no line is given, clear all breakpoints"""
        if arg:
            breakpoints -= int(arg)
        else:
            breakpoints = []
        print("Breakpoints:", breakpoints)

        
    def finish_command(self, arg=""):
        """Finish execution"""
        self.breakpoints = []
        self.stepping = False
        self.interact = False
        
    def help_command(self, command=""):
        """Give help on given command. If no command is given, give help on all"""
        
        if command:
            possible_cmds = [possible_cmd for possible_cmd in self.commands() if possible_cmd.startswith(command)]

            if len(possible_cmds) == 0:
                print(f"Unknown command {repr(command)}. Possible commands are:")
                possible_cmds = self.commands()
            elif len(possible_cmds) > 1:
                print(f"Ambiguous command {repr(command)}. Possible expansions are:")
                
        else:
            possible_cmds = self.commands()

        for cmd in possible_cmds:
            method = self.command_method(cmd)
            print(f"{cmd:10} -- {method.__doc__}")

    def command_method(self, command):
        if command.startswith('#'):
            return None

        possible_cmds = [possible_cmd for possible_cmd in self.commands() if possible_cmd.startswith(command)]
        if len(possible_cmds) != 1:
            self.help_command(command)
            return None
        
        cmd = possible_cmds[0]
        return getattr(self, cmd + '_command')
        
    def execute(self, command):
        sep = command.find(' ')
        if sep > 0:
            cmd = command[:sep].strip()
            arg = command[sep + 1:].strip()
        else:
            cmd = command.strip()
            arg = ""

        method = self.command_method(cmd)
        if method:
            method(arg)

    def commands(self):
        cmds = [method.replace('_command', '') for method in dir(self.__class__) if method.endswith('_command')]
        cmds.sort()
        return cmds

In [1092]:
debugger = Debugger()

In [1093]:
debugger.commands()

['break', 'clear', 'continue', 'finish', 'help', 'list', 'print', 'step']

In [1094]:
debugger.execute('h')

break      -- Set a breakoint in given line. If no line is given, print all breakpoints
clear      -- Clear a breakoint in given line. If no line is given, clear all breakpoints
continue   -- Resume execution
finish     -- Finish execution
help       -- Give help on given command. If no command is given, give help on all
list       -- Show current line
print      -- Print an expression. If no expression is given, print all variables
step       -- Execute up to the next line


In [1095]:
debugger.execute('b 25')

Breakpoints: {25}


In [1096]:
debugger.execute('h q')

Unknown command 'q'. Possible commands are:
break      -- Set a breakoint in given line. If no line is given, print all breakpoints
clear      -- Clear a breakoint in given line. If no line is given, clear all breakpoints
continue   -- Resume execution
finish     -- Finish execution
help       -- Give help on given command. If no command is given, give help on all
list       -- Show current line
print      -- Print an expression. If no expression is given, print all variables
step       -- Execute up to the next line


In [1106]:
debugger.execute('co')

* Set up the Makefile such that INTERACTIVE=False environment var is set
* Set up an own input() function that simply simulates input if in HTML; point to interactive notebook to try out things

### Excursion: All the Details

This text will only show up on demand (HTML) or not at all (PDF). This is useful for longer implementations, or repetitive, or specialized parts.

### End of Excursion

## _Section 3_

\todo{Add}

_If you want to introduce code, it is helpful to state the most important functions, as in:_

* `random.randrange(start, end)` - return a random number [`start`, `end`]
* `range(start, end)` - create a list with integers from `start` to `end`.  Typically used in iterations.
* `for elem in list: body` executes `body` in a loop with `elem` taking each value from `list`.
* `for i in range(start, end): body` executes `body` in a loop with `i` from `start` to `end` - 1.
* `chr(n)` - return a character with ASCII code `n`

In [1097]:
import random

In [1098]:
def int_fuzzer():
    """A simple function that returns a random integer"""
    return random.randrange(1, 100) + 0.5

In [1099]:
# More code
pass

## _Section 4_

\todo{Add}

## Synopsis

_For those only interested in using the code in this chapter (without wanting to know how it works), give an example.  This will be copied to the beginning of the chapter (before the first section) as text with rendered input and output._

You can use `int_fuzzer()` as:

In [1100]:
print(int_fuzzer())

80.5


## Lessons Learned

* _Lesson one_
* _Lesson two_
* _Lesson three_

## Next Steps

_Link to subsequent chapters (notebooks) here, as in:_

* [use _mutations_ on existing inputs to get more valid inputs](MutationFuzzer.ipynb)
* [use _grammars_ (i.e., a specification of the input format) to get even more valid inputs](Grammars.ipynb)
* [reduce _failing inputs_ for efficient debugging](Reducer.ipynb)


## Background

_Cite relevant works in the literature and put them into context, as in:_

The idea of ensuring that each expansion in the grammar is used at least once goes back to Burkhardt \cite{Burkhardt1967}, to be later rediscovered by Paul Purdom \cite{Purdom1972}.

## Exercises

_Close the chapter with a few exercises such that people have things to do.  To make the solutions hidden (to be revealed by the user), have them start with_

```
**Solution.**
```

_Your solution can then extend up to the next title (i.e., any markdown cell starting with `#`)._

_Running `make metadata` will automatically add metadata to the cells such that the cells will be hidden by default, and can be uncovered by the user.  The button will be introduced above the solution._

### Exercise 1: _Title_

_Text of the exercise_

In [1101]:
# Some code that is part of the exercise
pass

_Some more text for the exercise_

**Solution.** _Some text for the solution_

In [1102]:
# Some code for the solution
2 + 2

4

_Some more text for the solution_

### Exercise 2: _Title_

_Text of the exercise_

**Solution.** _Solution for the exercise_