# Mining Function Specifications

When testing a program, one not only needs to cover its several behaviors; one also needs to _check_ whether the result is as expected.  In this chapter, we introduce a technique that allows us to _mine_ function specifications from a set of given executions, resulting in abstract and formal _descriptions_ of what the function expects and what it delivers.  

These so-called _dynamic invariants_ produce pre- and post-conditions over function arguments and variables from a set of executions.  They are useful in a variety of contexts:

* Dynamic invariants provide important information for [symbolic fuzzing](SymbolicFuzzing.ipynb), such as types and ranges of function arguments.
* Dynamic invariants provide pre- and postconditions for formal program proofs and verification.
* Dynamic invariants provide a large number of assertions that can check whether function behavior has changed.

Traditionally, dynamic invariants are dependent on the executions they are derived from.  However, when paired with comprehensive test generators, they quickly become very precise, as we show in this chapter.

**Prerequisites**

* You should be familiar with tracing program executions, as in the [chapter on coverage](Coverage.ipynb).
* Later in this section, we make use of Python program transformations; some knowledge on the Python AST functions is helpful.
* The interplay with symbolic testing builds on, well, [symbolic testing](SymbolicFuzzer.ipynb),

In [None]:
import fuzzingbook_utils

In [None]:
import Intro_Testing

## Specifications and Assertions

When implementing a function or program, one usually works against a _specification_ – a set of documented requirements to be satisfied by the code.  Such specifications can come in natural language.  A formal specification, however, allows the computer to check whether the specification is satisfied.

In the [introduction to testing](Intro_Testing.ipynb), we have seen how _preconditions_ and _postconditions_ can describe what a function does.  Consider the following (simple) square root function:

In [None]:
def my_sqrt(x):
    assert x >= 0  # Precondition
    
    ...
    
    assert result * result == x  # Postcondition
    return result

The assertion `assert p` checks the condition `p`; if it does not hold, execution is aborted.  Here, the actual body is not yet written; we use the assertions as a specification of what `my_sqrt()` _expects_, and what it _delivers_.

The topmost assertion is the _precondition_, stating the requirements on the function arguments.  The assertion at the end is the _postcondition_, stating the properties of the function result (including its relationship with the original arguments).  Using these pre- and postconditions as a specification, we can now go and implement a square root function that satisfies them.  Once implemented, we can have the assertions check at runtime whether `my_sqrt()` works as expected; a [symbolic](SymbolicFuzzer.ipynb) or [concolic](ConcolicFuzzer.ipynb) test generator will even specifically try to find inputs where the assertions do _not_ hold.  (An assertion can be seen as a conditional branch towards aborting the execution, and any technique that tries to cover all code branches will also try to invalidate as many assertions as possible.)

However, not every piece of code is developed with explicit specifications in the first place; let alone does most code comes with formal pre- and post-conditions.  (Just take a look at the chapters in this book.)  This is a pity: As Ken Thompson famously said, "Without specifications, there are no bugs – only surprises".  It is also a problem for testing, since, of course, testing needs some specification to test against.  This raises the interesting question: Can we somehow _retrofit_ existing code with "specifications" that properly describe their behavior, allowing developers to simply _check_ them rather than having to write them from scratch?  This is what we do in this chapter.

## Mining Type Specifications

For our Python code, one of the most important "specifications" we need is *types*.  Python being a "dynamically" typed language means that all data types are determined at run time; the code itself does not explicitly state whether a variable is an integer, a string, an array, a dictionary – or whatever.  As _writer_ of Python code, omitting explicit type declarations may save time (and allows for some fun hacks).  It is not sure whether a lack of types helps in _reading_ and _understanding_ code for humans.  For a _computer_ trying to analyze code, the lack of explicit types is detrimental.  If, say, a constraint solver, sees `if x:` and cannot know whether `x` is supposed to be a number or a string, this introduces an ambiguity which multiplies over the entire analysis in a combinatorial explosion.  Our first task thus will be to mine _static_ types (as part of the code) from _values_ we observe at run time.

How can we mine types from executions?  The answer is simple: 

1. We observe a function during execution
2. We track the _types_ of its arguments
3. We include these types as annotations or assertions into the codee.

To do so, we can make use of Python's tracing facility we already observed in the [chapter on coverage](Coverage.ipynb).  With every call to a function, we retrieve the arguments, their values, and their types.

As an example, consider the full implementation of `my_sqrt()` from the [introduction to testing](Intro_Testing.ipynb):

In [None]:
import fuzzingbook_utils

In [None]:
def my_sqrt(x):
    """Computes the square root of x, using the Newton-Raphson method"""
    approx = None
    guess = x / 2
    while approx != guess:
        approx = guess
        guess = (approx + x / approx) / 2
    return approx

`my_sqrt()` does not come with any assertions that would check types or values.  Hence, it is easy for callers to make mistakes when calling `my_sqrt()`:

In [None]:
from ExpectError import ExpectError, ExpectTimeout

In [None]:
with ExpectError():
    my_sqrt("foo")

In [None]:
with ExpectError():
    x = my_sqrt(0)

At least, the Python system catches these errors at runtime.  The following call, however, simply lets the function enter an infinite loop:

In [None]:
with ExpectTimeout(1):
    x = my_sqrt(-1)

Our goal is to avoid such errors by _annotating_ functions with type information that prevents errors like the above ones.
These type annotations would be _mined_ from actual function executions, _learning_ from (normal) runs what the expected argument and return types should be.  By observing a series of calls such as these, we could infer that both `x` and the return value are of type `float`:

In [None]:
my_sqrt(25.0)
my_sqrt(4.0)

This observation could result in the following annotated variant, which comes with types that can be checked both at compile time and runtime:

In [None]:
def my_sqrt_annotated(x: float) -> float:
    """Computes the square root of x, using the Newton-Raphson method"""
    ...

### Tracking Calls

To observe argument types at runtime, we define a _tracer function_ that tracks the execution of `my_sqrt()`, checking its arguments and return values.  The `Tracker` class is set to trace functions in a `with` block as follows:

```python
with Tracker() as tracker:
    function_to_be_tracked(...)
info = tracker.collected_information()
```

As in the [chapter on coverage](Coverage.ipynb), we use the `sys.settrace()` function to trace individual functions during execution.  We turn on tracking when the `with` block starts; at this point, the `__enter__()` method is called.  When execution of the `with` block ends, `__exit()__` is called.  

In [None]:
import sys

In [None]:
class Tracker(object):
    def __init__(self, log=False):
        self._log = log
        self.reset()

    def reset(self):
        self._calls = {}
        self._stack = []

    def traceit(self):
        """Placeholder to be overloaded in subclasses"""
        pass

    # Start of `with` block
    def __enter__(self):
        self.original_trace_function = sys.gettrace()
        sys.settrace(self.traceit)
        return self

    # End of `with` block
    def __exit__(self, exc_type, exc_value, tb):
        sys.settrace(self.original_trace_function)

The `traceit()` method does nothing yet; this is done in specialized subclasses.  The `CallTracker` class implements a `traceit()` function that checks for function calls and returns:

In [None]:
class CallTracker(Tracker):
    # Tracking function: Record all calls and all args
    def traceit(self, frame, event, arg):
        if event == "call":
            self.trace_call(frame, event, arg)
        elif event == "return":
            self.trace_return(frame, event, arg)
            
        return self.traceit

`trace_call()` is called when a function is called; it retrieves the function name and current arguments, and saves them on a stack.

In [None]:
class CallTracker(CallTracker):
    def trace_call(self, frame, event, arg):
        code = frame.f_code
        function_name = code.co_name
        arguments = get_arguments(frame)
        self._stack.append((function_name, arguments))

        if self._log:
            print(simple_call_string(function_name, arguments))

In [None]:
def get_arguments(frame):
    """Return call arguments in the given frame"""
    # When called, all arguments are local variables
    arguments = [(var, frame.f_locals[var]) for var in frame.f_locals]
    arguments.reverse()  # Want same order as call
    return arguments

When the function returns, `trace_return()` is called.  We now also have the return value.  We log the whole call with arguments and return value (if desired) and save it in our list of calls.

In [None]:
class CallTracker(CallTracker):
    def trace_return(self, frame, event, arg):
        code = frame.f_code
        function_name = code.co_name
        return_value = arg
        
        called_function_name, called_arguments = self._stack.pop()
        assert function_name == called_function_name
        
        if self._log:
            print(simple_call_string(function_name, called_arguments), "returns", return_value)
            
        self.add_call(function_name, called_arguments, return_value)

`simple_call_string()` is a helper for logging that prints out calls in a user-friendly manner.

In [None]:
def simple_call_string(function_name, argument_list, return_value=None):
    """Return function_name(arg[0], arg[1], ...) as a string"""
    call = function_name + "(" + \
        ", ".join([var + "=" + repr(value)
                   for (var, value) in argument_list]) + ")"

    if return_value is not None:
        call += " = " + repr(return_value)
        
    return call

`add_call()` saves the calls in a list; each function name has its own list.

In [None]:
class CallTracker(CallTracker):
    def add_call(self, function_name, arguments, return_value=None):
        """Add given call to list of calls"""
        if function_name not in self._calls:
            self._calls[function_name] = []
        self._calls[function_name].append((arguments, return_value))

Using `calls()`, we can retrieve the list of calls, either for a given function, or for all functions.

In [None]:
class CallTracker(CallTracker):
    def calls(self, function_name=None):
        if function_name is None:
            return self._calls

        return self._calls[function_name]

Let us now put this to use.  We turn on logging to track the individual calls and their return values:

In [None]:
with CallTracker(log=True) as tracker:
    y = my_sqrt(25)
    y = my_sqrt(2.0)

After execution, we can retrieve the individual calls:

In [None]:
calls = tracker.calls('my_sqrt')
calls

Each call is pair (`argument_list`, `return_value`), where `argument_list` is a list of pairs (`parameter_name`, `value`).

In [None]:
my_sqrt_argument_list, my_sqrt_return_value = calls[0]
simple_call_string('my_sqrt', my_sqrt_argument_list, my_sqrt_return_value)

If the function does not return a value, `return_value` is `None`.

In [None]:
def hello(name):
    print("Hello,", name)

In [None]:
with CallTracker() as tracker:
    hello("world")

In [None]:
hello_calls = tracker.calls('hello')
hello_calls

In [None]:
hello_argument_list, hello_return_value = hello_calls[0]
simple_call_string('hello', hello_argument_list, hello_return_value)

### Getting Types

Despite what you may have read or heard, Python is actually a _typed_ language.  It is just that it is _dynamically typed_ – types are used and checked only at runtime (rather than declared in the code, where they can be _statically checked_ at compile time).  We can thus retrieve types of all values within Python:

In [None]:
type(4)

In [None]:
type(2.0)

In [None]:
type([4])

We can retrieve the type of the first argument to `my_sqrt()`:

In [None]:
parameter, value = my_sqrt_argument_list[0]
parameter, type(value)

as well as the type of the return value:

In [None]:
type(my_sqrt_return_value)

Hence, we see that (so far), `my_sqrt()` is a function taking (among others) integers and returning floats.  We could declare `my_sqrt()` as:

In [None]:
def my_sqrt_annotated(x: int) -> float:
    return my_sqrt(x)

This is a representation we could place in a static type checker, allowing to check whether calls to `my_sqrt()` actually pass a number.  A dynamic type checker could run such checks at runtime.  And of course, any [symbolic interpretation](SymbolicFuzzer.ipynb) will greatly profit from the additional annotations.

By default, Python does not do anything with such annotations.  However, tools can access annotations from functions and other objects:

In [None]:
my_sqrt_annotated.__annotations__

### Accessing Function Structure

Our plan is to annotate functions automatically, based on the types we have seen.  To do so, we need a few modules that allow us to convert a function into a tree representation (called _abstract syntax trees_, or ASTs) and back; we already have seen these in the chapters on [concolic](ConcolicFuzzing.ipynb) and [symbolic](SymbolicFuzzing.ipynb).

In [None]:
import ast
import inspect
import astunparse

We can get the source of a Python function as follows.  (Note that this does not work for functions defined in other notebooks.)

In [None]:
my_sqrt_source = inspect.getsource(my_sqrt)
my_sqrt_source

To view these in a visually pleasing form, the function `print_content(s, suffix)` formats and highlights the string `s` as if it were a file with ending `suffix`.  We can thus view (and highlight) the source as if it were a Python file:

In [None]:
from fuzzingbook_utils import print_content

In [None]:
print_content(my_sqrt_source, '.py')

Parsing this gives us an abstract syntax tree (AST):

In [None]:
my_sqrt_ast = ast.parse(my_sqrt_source)

The helper functions `astunparse.dump()` (textual output) and `showast.show_ast()` (graphical output with [showast](https://github.com/hchasestevens/show_ast)) allow us to inspect the structure of the tree.  We see that the function starts as a `FunctionDef` with name and arguments, followed by a body, which is a list of statements of type `Expr` (the docstring), type `Assign` (assignments), `While` (while loop with its own body), and finally `Return`.

In [None]:
print(astunparse.dump(my_sqrt_ast))

Too much text for you?  This graphical representation may make things simpler.

In [None]:
from fuzzingbook_utils import rich_output

In [None]:
if rich_output():
    import showast
    showast.show_ast(my_sqrt_ast)

The function `astunparse.unparse()` converts such a tree back into the more familiar textual Python code representation:

In [None]:
print_content(astunparse.unparse(my_sqrt_ast), '.py')

### Annotating Functions

Let us now go and transform these trees with type annotations.  We start with a helper function `parse_type(name)` which  parses a type name into an AST.

In [None]:
def parse_type(name):
    class ValueVisitor(ast.NodeVisitor):
        def visit_Expr(self, node):
            self.value_node = node.value
        
    tree = ast.parse(name)
    name_visitor = ValueVisitor()
    name_visitor.visit(tree)
    return name_visitor.value_node

In [None]:
print(astunparse.dump(parse_type('int')))

In [None]:
print(astunparse.dump(parse_type('[object]')))

We now define a helper function that actually adds type annotations to a function AST.  The `TypeTransformer` class builds on the Python standard library `ast.NodeTransformer` infrastructure.  It would be called as

```python
    TypeTransformer({'x': 'int'}, 'float').visit(ast)
```

to annotate the arguments of `my_sqrt()`: `x` with `int`, and the return type with `float`.  The returned AST can then be unparsed, compiled or analyzed.

In [None]:
class TypeTransformer(ast.NodeTransformer):
    def __init__(self, argument_types, return_type=None):
        self.argument_types = argument_types
        self.return_type = return_type
        super().__init__()

The core of `TypeTransformer` is the method `visit_FunctionDef()`, which is called for every function definition in the AST.  Its argument `node` is the subtree of the function definition to be transformed.  Our implementation accesses the individual arguments and invokes `annotate_args()` on them; it also sets the return type in the `returns` attribute of the node.

In [None]:
class TypeTransformer(TypeTransformer):
    def visit_FunctionDef(self, node):
        """Add annotation to function"""
        # Set argument types
        new_args = []
        for arg in node.args.args:
            new_args.append(self.annotate_arg(arg))

        new_arguments = ast.arguments(
            new_args,
            node.args.vararg,
            node.args.kwonlyargs,
            node.args.kw_defaults,
            node.args.kwarg,
            node.args.defaults
        )

        # Set return type
        if self.return_type is not None:
            node.returns = parse_type(self.return_type)
        
        return ast.copy_location(ast.FunctionDef(node.name, new_arguments, 
                                                 node.body, node.decorator_list,
                                                 node.returns), node)

Each argument gets its own annotation, taken from the types originally passed to the class:

In [None]:
class TypeTransformer(TypeTransformer):
    def annotate_arg(self, arg):
        """Add annotation to single function argument"""
        arg_name = arg.arg
        if arg_name in self.argument_types:
            arg.annotation = parse_type(self.argument_types[arg_name])
        return arg

Does this work?  Let us annotate the AST from `my_sqrt()` with types for the arguments and return types:

In [None]:
new_ast = TypeTransformer({'x': 'int'}, 'float').visit(my_sqrt_ast)

When we unparse the new AST, we see that the annotations actually are present:

In [None]:
print_content(astunparse.unparse(new_ast), '.py')

Similarly, we can annotate the `hello()` function from above:

In [None]:
hello_source = inspect.getsource(hello)

In [None]:
hello_ast = ast.parse(hello_source)

In [None]:
new_ast = TypeTransformer({'name': 'str'}, 'None').visit(hello_ast)

In [None]:
print_content(astunparse.unparse(new_ast), '.py')

### Annotating Functions with Mined Types

Let us now bring together the mining of types with the annotation.  We start with a set of calls as returned by `CallTracker.calls()`:

In [None]:
with CallTracker() as tracker:
    y = my_sqrt(25.0)
    y = my_sqrt(2.0)

In [None]:
tracker.calls()

For each of these values, we have to determine the appropriate type (as a string):

In [None]:
def type_string(value):
    return type(value).__name__

In [None]:
type_string(4)

In [None]:
type_string([])

The function `annotate_types()` takes such a list of calls and annotates each function listed:

In [None]:
def annotate_types(calls):
    annotated_functions = {}
    
    for function_name in calls:
        try:
            annotated_functions[function_name] = annotate_function_with_types(function_name, calls[function_name])
        except KeyError:
            continue

    return annotated_functions

For each function, we get the source and its AST:

In [None]:
def annotate_function_with_types(function_name, function_calls):
    function = globals()[function_name]  # May raise KeyError for internal functions
    function_code = inspect.getsource(function)
    function_ast = ast.parse(function_code)
    return annotate_function_ast_with_types(function_ast, function_calls)

We invoke the `TypeTransformer` with the calls seen, and for each call, iterate over the arguments, determine their types, and annotate the AST with these.  The universal type `Any` is used when we encounter type conflicts, which we will discuss below.

In [None]:
from typing import Any

In [None]:
def annotate_function_ast_with_types(function_ast, function_calls):
    parameter_types = {}
    return_type = None

    for calls_seen in function_calls:
        args, return_value = calls_seen
        if return_value is not None:
            if return_type is not None and return_type != type_string(return_value):
                return_type = 'Any'
            else:
                return_type = type_string(return_value)
            
            
        for parameter, value in args:
            try:
                different_type = parameter_types[parameter] != type_string(value)
            except KeyError:
                different_type = False
                
            if different_type:
                parameter_types[parameter] = 'Any'
            else:
                parameter_types[parameter] = type_string(value)
        
    annotated_function_ast = TypeTransformer(parameter_types, return_type).visit(function_ast)
    return annotated_function_ast

Here is `my_sqrt()` annotated with the types recorded usign the tracker, above.

In [None]:
print_content(astunparse.unparse(annotate_types(tracker.calls())['my_sqrt']), '.py')

### All-in-one Annotation

Let us bring all of this together in a single class `TypeAnnotator` that first tracks calls of functions and then allows to access the AST (and the source code form) of the tracked functions annotated with types.  `typed_functions()` returns the annotated functions as a string; `typed_functions_ast()` returns their AST.

In [None]:
class TypeTracker(CallTracker):
    pass

In [None]:
class TypeAnnotator(TypeTracker):
    def typed_functions_ast(self, function_name=None):
        if function_name is None:
            return annotate_types(self.calls())
        
        return annotate_function_with_types(function_name, self.calls(function_name))
    
    def typed_functions(self, function_name=None):
        if function_name is None:
            functions = ''
            for f_name in self.calls():
                try:
                    f_text = astunparse.unparse(self.typed_functions_ast(f_name))
                except KeyError:
                    f_text = ''
                functions += f_text
            return functions

        return astunparse.unparse(self.typed_functions_ast(function_name))

Here is how to use `TypeAnnotator`.  We first track a series of calls:

In [None]:
with TypeAnnotator() as annotator:
    y = my_sqrt(25.0)
    y = my_sqrt(2.0)

After tracking, we can immediately retrieve an annotated version of the functions tracked:

In [None]:
print_content(annotator.typed_functions(), '.py')

This also works for multiple and diverse functions.  One could go and implement an automatic type annotator for Python files based on the types seen during execution.

In [None]:
with TypeAnnotator() as annotator:
    hello('type annotations')
    y = my_sqrt(1.0)

In [None]:
print_content(annotator.typed_functions(), '.py')

### Multiple Types

Let us now resolve the role of the magic `Any` type in `annotate_function_ast_with_types()`.  If we see multiple types for the same argument, we set its type to `object`.  For `my_sqrt()`, this makes sense, as its arguments can be integers as well as floats:

In [None]:
with CallTracker() as tracker:
    y = my_sqrt(25.0)
    y = my_sqrt(4)

In [None]:
print_content(astunparse.unparse(annotate_types(tracker.calls())['my_sqrt']), '.py')

The following function `sum3()` can be called with floating-point numbers as arguments, resulting in the parameters getting a `float` type:

In [None]:
def sum3(a, b, c):
    return a + b + c

In [None]:
with TypeAnnotator() as annotator:
    y = sum3(1.0, 2.0, 3.0)
y

In [None]:
print_content(annotator.typed_functions(), '.py')

If we call `sum3()` with integers, though, the arguments get an `int` type:

In [None]:
with TypeAnnotator() as annotator:
    y = sum3(1, 2, 3)
y

In [None]:
print_content(annotator.typed_functions(), '.py')

And we can also call `sum3()` with strings, giving the arguments a `str` type:

In [None]:
with TypeAnnotator() as annotator:
    y = sum3("one", "two", "three")
y

In [None]:
print_content(annotator.typed_functions(), '.py')

If we have multiple calls, but with different types, `TypeAnnotator()` will assign an `Any` type to both arguments and return values:

In [None]:
with TypeAnnotator() as annotator:
    y = sum3(1, 2, 3)
    y = sum3("one", "two", "three")

In [None]:
typed_sum3_def = annotator.typed_functions('sum3')

In [None]:
print_content(typed_sum3_def, '.py')

A static checker can import the above annotated definition and then check its annotations:

In [None]:
exec(typed_sum3_def.replace('sum3', 'typed_sum3'))

In [None]:
assert typed_sum3.__annotations__ == {'a': Any, 'b': Any, 'c': Any, 'return': Any}

A more interesting usage for types, though, is in symbolic testing and reasoning, as discussed in the next section.

In [None]:
def f(x: Any):
    pass

### Static Checking

\todo{add}

In [None]:
import SymbolicFuzzer  # minor dependency

# Make this an exercise.  Also need local variables to be typed!

## Mining Invariants

Besides basic data types. we can infer several further properties from arguments.  We can, for instance, infer the _range_ of values a variable is in, and thus determine whether an argument can be negative, zero, or positive – a property that cannot be expressed in a (Python) type.  To this end, we use the same tracking functionality as before; instead of saving types for individual variables, we now save whether we have seen specific properties occurring.

the _smallest_ and _largest_ values observed.  For a function such as `sqrt(x)`, for instance, we could go and find that in actual executions, the smallest value of `x` is zero, while there is (apparently) no upper limit for `x`.

Once we have mined such ranges, we can use them as _pre- and postconditions_ in functions – that is, we can have functions check automatically whether values fall into the ranges observed earlier.  A `sqrt(x)` function, for instance, could check whether its argument `x` is within the range `x >= 0` observed earlier – that is, a _precondition_:

In [None]:
def sqrt_with_precondition(x):
    assert x >= 0
    ...

For the _result_, a similar property should hold (postcondition)

In [None]:
def sqrt_with_postcondition(x):
    return_value = ...
    assert return_value >= 0
    return return_value

### Defining Properties

In [None]:
INVARIANT_PROPERTIES = [
    "X < 0",
    "X <= 0",
    "X > 0",
    "X >= 0",
    "X == 0",
    "X != 0",

    "X == Y",
    "X > Y",
    "X >= Y",
    "X <= Y",
    
    "isinstance(X, bool)",
    "isinstance(X, int)",
    "isinstance(X, float)",
    "isinstance(X, list)",
    "isinstance(X, dict)",
    
    "X == Y + Z",
    "X == Y * Z",
    "X == Y - Z",
    "X == Y / Z",

    "X < Y < Z",
    "X <= Y <= Z",
    "X > Y > Z",
    "X >= Y >= Z",

    "X == len(Y)",
    "X == sum(Y)",
    "X.startswith(Y)",
]

### Extracting Meta-Variables

In [None]:
def metavars(prop):
    metavar_list = []
    
    class ArgVisitor(ast.NodeVisitor):
        def visit_Name(self, node):
            if node.id.isupper():
                metavar_list.append(node.id)

    ArgVisitor().visit(ast.parse(prop))
    return metavar_list

In [None]:
assert metavars("X < 0") == ['X']

In [None]:
assert metavars("X.startswith(Y)") == ['X', 'Y']

In [None]:
assert metavars("isinstance(X, str)") == ['X']

### Instantiating Properties

In [None]:
def instantiate_prop_ast(prop, var_names):
    class NameTransformer(ast.NodeTransformer):
        def visit_Name(self, node):
            if node.id not in mapping:
                return node
            return ast.Name(id=mapping[node.id], ctx=ast.Load())
    
    class BodyVisitor(ast.NodeVisitor):
        def visit_Lambda(self, node):
            self.body = node.body
    
    meta_variables = metavars(prop)
    assert len(meta_variables) == len(var_names)

    mapping = {}
    for i in range(0, len(meta_variables)):
        mapping[meta_variables[i]] = var_names[i]

    prop_ast = ast.parse(prop, mode='eval')
    new_ast = NameTransformer().visit(prop_ast)

    return new_ast

In [None]:
def instantiate_prop(prop, var_names):
    prop_ast = instantiate_prop_ast(prop, var_names)
    prop_text = astunparse.unparse(prop_ast).strip()
    while prop_text.startswith('(') and prop_text.endswith(')'):
        prop_text = prop_text[1:-1]
    return prop_text

In [None]:
assert instantiate_prop("X > Y", ['a', 'b']) == 'a > b'

In [None]:
assert instantiate_prop("X.startswith(Y)", ['x', 'y']) == 'x.startswith(y)'

### Evaluating Properties

In [None]:
def prop_function_text(prop):
    return "lambda " + ", ".join(metavars(prop)) + ": " + prop

In [None]:
prop_function_text("X > Y")

In [None]:
def prop_function_ast(prop):
    return ast.parse(prop_function_text(prop), mode='eval')

In [None]:
prop_ast = prop_function_ast("X > Y")
ast.dump(prop_ast)

In [None]:
def prop_function(prop):
    return eval(prop_function_text(prop))

In [None]:
f = prop_function("X > Y")
f(100, 1)

### Extracting Invariants

In [None]:
import itertools

In [None]:
for combination in itertools.permutations(['a', 'b', 'c'], 2):
    print(combination)

In [None]:
def true_property_instantiations(prop, vars_and_values, log=False):
    instantiations = set()
    p = prop_function(prop)

    len_metavars = len(metavars(prop))
    for combination in itertools.permutations(vars_and_values, len_metavars):
        args = [value for var_name, value in combination]
        var_names = [var_name for var_name, value in combination]
        
        try:
            result = p(*args)
        except:
            result = None

        if log:
            print(prop, combination, result)
        if result:
            instantiations.add((prop, tuple(var_names)))
            
    return instantiations

In [None]:
true_property_instantiations("X < Y", [('x', -1), ('y', 1)], log=True)

In [None]:
true_property_instantiations("X < 0", [('x', -1), ('y', 1)], log=True)

In [None]:
class InvariantTracker(CallTracker):
    def __init__(self, props=None, **kwargs):
        if props is None:
            props = INVARIANT_PROPERTIES

        self.props = props
        super().__init__(**kwargs)

In [None]:
class InvariantTracker(InvariantTracker):
    def invariants(self, function_name=None):
        if function_name is None:
            return {function_name: self.invariants(function_name) for function_name in self.calls()}
        
        invariants = None
        for variables, return_value in self.calls(function_name):
            vars_and_values = variables + [('return_value', return_value)]
            
            s = set()
            for prop in self.props:
                s |= true_property_instantiations(prop, vars_and_values, self._log)
            if invariants is None:
                invariants = s
            else:
                invariants &= s

        return invariants

In [None]:
with InvariantTracker() as tracker:
    y = my_sqrt(25)
    y = my_sqrt(10)

tracker.calls()

In [None]:
tracker.invariants()

In [None]:
invs = tracker.invariants('my_sqrt')
invs

In [None]:
def pretty_invariants(invariants):
    props = []
    for (prop, var_names) in invariants:
        props.append(instantiate_prop(prop, var_names))
    return sorted(props)

In [None]:
pretty_invariants(invs)

In [None]:
with InvariantTracker() as tracker:
    y = sum3(1, 2, 3)
    y = sum3(-4, -5, -6)
    
pretty_invariants(tracker.invariants('sum3'))

In [None]:
with InvariantTracker() as tracker:
    y = sum3('a', 'b', 'c')
    y = sum3('f', 'e', 'd')
    
pretty_invariants(tracker.invariants('sum3'))

In [None]:
with InvariantTracker() as tracker:
    y = sum3('a', 'b', 'c')
    y = sum3('c', 'b', 'a')
    y = sum3(-4, -5, -6)
    y = sum3(0, 0, 0)
    
pretty_invariants(tracker.invariants('sum3'))

### Annotating Functions with Pre- and Postconditions

In [None]:
import functools

In [None]:
# from https://stackoverflow.com/questions/12151182/python-precondition-postcondition-for-member-function-how

def condition(precondition=None, postcondition=None):
    def decorator(func):
        @functools.wraps(func) # presever name, docstring, etc
        def wrapper(*args, **kwargs): #NOTE: no self
            if precondition is not None:
               assert precondition(*args, **kwargs), "Precondition violated"

            retval = func(*args, **kwargs) # call original function or method
            if postcondition is not None:
               assert postcondition(retval, *args, **kwargs), "Postcondition violated"

            return retval
        return wrapper
    return decorator

def precondition(check):
    return condition(precondition=check)

def postcondition(check):
    return condition(postcondition=check)

In [None]:
@precondition(lambda x: x > 0)
def my_sqrt_with_precondition(x):
    return my_sqrt(x)

In [None]:
with ExpectError():
    my_sqrt_with_precondition(-1)

In [None]:
EPSILON = 1e-5

In [None]:
@postcondition(lambda ret, x: ret * ret - x < EPSILON)
def my_sqrt_with_postcondition(x):
    return my_sqrt(x)

In [None]:
y = my_sqrt_with_postcondition(2)
y

In [None]:
@postcondition(lambda ret, x: ret * ret - x < EPSILON)
def buggy_my_sqrt_with_postcondition(x):
    return my_sqrt(x) + 0.1

In [None]:
with ExpectError():
    y = buggy_my_sqrt_with_postcondition(2)

### Converting Mined Invariants to Annotations

In [None]:
class InvariantAnnotator(InvariantTracker):
    def params(self, function_name):
        arguments, return_value = self.calls(function_name)[0]
        return ", ".join(arg_name for (arg_name, arg_value) in arguments)

In [None]:
with InvariantAnnotator() as annotator:
    y = my_sqrt(25)
    y = sum3(1, 2, 3)

In [None]:
annotator.params('my_sqrt')

In [None]:
annotator.params('sum3')

In [None]:
class InvariantAnnotator(InvariantAnnotator):
    def preconditions(self, function_name):
        conditions = []

        for inv in pretty_invariants(self.invariants(function_name)):
            if inv.find("return_value") > 0:
                continue  # Postcondition

            cond = "@precondition(lambda " + self.params(function_name) + ": " + inv + ")"
            conditions.append(cond)

        return conditions

In [None]:
with InvariantAnnotator() as annotator:
    y = my_sqrt(25)
    y = sum3(1, 2, 3)

In [None]:
annotator.preconditions('my_sqrt')

In [None]:
class InvariantAnnotator(InvariantAnnotator):
    def postconditions(self, function_name):
        conditions = []

        for inv in pretty_invariants(self.invariants(function_name)):
            if inv.find("return_value") < 0:
                continue  # Precondition

            cond = "@postcondition(lambda return_value, " + self.params(function_name) + ": " + inv + ")"
            conditions.append(cond)

        return conditions

In [None]:
with InvariantAnnotator() as annotator:
    y = my_sqrt(25)
    y = sum3(1, 2, 3)

In [None]:
annotator.postconditions('my_sqrt')

In [None]:
class InvariantAnnotator(InvariantAnnotator):
    def functions_with_invariants(self):
        functions = ""
        for function_name in self.invariants():
            try:
                function = self.function_with_invariants(function_name)
            except KeyError:
                continue
            functions += function
        return functions

    def function_with_invariants(self, function_name):
        function = globals()[function_name]  # Can throw KeyError
        source = inspect.getsource(function)
        return "\n".join(self.preconditions(function_name) + 
                         self.postconditions(function_name)) + '\n' + source

In [None]:
with InvariantAnnotator() as annotator:
    y = my_sqrt(25)
    y = sum3(1, 2, 3)

In [None]:
print_content(annotator.function_with_invariants('my_sqrt'), '.py')

In [None]:
def list_length(L):
    if L == []:
        length = 0
    else:
        length = 1 + list_length(L[1:])
    return length

In [None]:
with InvariantAnnotator() as annotator:
    length = list_length([1, 2, 3])

print_content(annotator.functions_with_invariants(), '.py')

In [None]:
def print_sum(a, b):
    print(a + b)

In [None]:
with InvariantAnnotator() as annotator:
    print_sum(31, 45)

In [None]:
print_content(annotator.functions_with_invariants(), '.py')

## Checking Specifications

In [None]:
with InvariantAnnotator() as annotator:
    y = my_sqrt(25)

In [None]:
my_sqrt_def = annotator.functions_with_invariants()

In [None]:
print_content(my_sqrt_def, '.py')

In [None]:
exec(my_sqrt_def.replace('my_sqrt', 'my_sqrt_annotated'))

In [None]:
with ExpectError():
    my_sqrt_annotated(-1)

In [None]:
with ExpectTimeout(1):
    my_sqrt(-1)

## Mining Specifications from Generated Tests

Mined specifications can only be as good as the executions they were mined from.  If we only see a single call to, say, `sum2()`, we will be faced with several mined pre- and postconditions that _overspecialize_ towards the values seen:

In [None]:
def sum2(a, b):
    return a + b

In [None]:
with InvariantAnnotator() as annotator:
    y = sum2(2, 2)
print_content(annotator.functions_with_invariants(), '.py')

The mined precondition `a == b`, for instance, only holds for the single call observed; the same holds for the mined postcondition `return_value == (a * b)`.  Yet, `sum2()` can obviously be successfully called with other values that do not satisfy these conditions.

To get out of this trap, we have to _learn from more and more diverse runs_.

In [None]:
with InvariantAnnotator() as annotator:
    length = sum2(1, 2)
    length = sum2(-1, -2)
    length = sum2(0, 0)

print_content(annotator.functions_with_invariants(), '.py')

But where to we get such diverse runs from?  This is the job of generating software tests.  A simple grammar for calls of `sum2()` will easily resolve the problem.

In [None]:
from GrammarFuzzer import GrammarFuzzer
from Grammars import is_valid_grammar, crange, convert_ebnf_grammar

In [None]:
SUM2_EBNF_GRAMMAR = {
    "<start>": ["<sum2>"],
    "<sum2>": ["sum2(<int>, <int>)"],
    "<int>": ["<_int>"],
    "<_int>": ["(-)?<leaddigit><digit>*", "0"],
    "<leaddigit>": crange('1', '9'),
    "<digit>": crange('0', '9')
}

assert is_valid_grammar(SUM2_EBNF_GRAMMAR)

In [None]:
sum2_grammar =  convert_ebnf_grammar(SUM2_EBNF_GRAMMAR)

In [None]:
sum2_fuzzer = GrammarFuzzer(sum2_grammar)
[sum2_fuzzer.fuzz() for i in range(10)]

In [None]:
with InvariantAnnotator() as annotator:
    for i in range(10):
        eval(sum2_fuzzer.fuzz())

print_content(annotator.function_with_invariants('sum2'), '.py')

But then, writing tests (or a test driver) just to derive a set of pre- and postconditions may possibly be too much effort – in particular, since tests can easily be derived from given pre- and postconditions.  Also, an API grammar, such as above, will have to be set up such that it actually respects preconditions.

However, there is one exception to the rule: If one can automatically generate tests at the system level, then it is easy to extract function invariants from these very tests.  In the next part, we will explore doing this in a variety of contexts.

Exercise: use a concolic fuzzer to systematically try to invalidate assertions

Exercise: Use a grammar to generate even more properties.

## Lessons Learned

* _Lesson one_
* _Lesson two_
* _Lesson three_

## Next Steps

_Link to subsequent chapters (notebooks) here, as in:_

* [use _mutations_ on existing inputs to get more valid inputs](MutationFuzzer.ipynb)
* [use _grammars_ (i.e., a specification of the input format) to get even more valid inputs](Grammars.ipynb)
* [reduce _failing inputs_ for efficient debugging](Reducer.ipynb)


## Background

_Cite relevant works in the literature and put them into context, as in:_

The idea of ensuring that each expansion in the grammar is used at least once goes back to Burkhardt \cite{Burkhardt1967}, to be later rediscovered by Paul Purdom \cite{Purdom1972}.

## Exercises

Exercise: Have `InvariantTransformer` save initial values and take care of multiple returns.

Exercise: Implement implications

Exercise: Include local variables in postconditions


_Close the chapter with a few exercises such that people have things to do.  To make the solutions hidden (to be revealed by the user), have them start with_

```markdown
**Solution.**
```

_Your solution can then extend up to the next title (i.e., any markdown cell starting with `#`)._

_Running `make metadata` will automatically add metadata to the cells such that the cells will be hidden by default, and can be uncovered by the user.  The button will be introduced above the solution._

### Exercise 1: _Title_

_Text of the exercise_

In [None]:
# Some code that is part of the exercise
pass

_Some more text for the exercise_

**Solution.** _Some text for the solution_

In [None]:
# Some code for the solution
2 + 2

_Some more text for the solution_

### Exercise 2: _Title_

_Text of the exercise_

**Solution.** _Solution for the exercise_

### Embedding Invariants into Functions

In [None]:
class EmbeddedInvariantAnnotator(InvariantTracker):
    def functions_with_invariants_ast(self, function_name=None):
        if function_name is None:
            return annotate_functions_with_invariants(self.invariants())
        
        return annotate_function_with_invariants(function_name, self.invariants(function_name))
    
    def functions_with_invariants(self, function_name=None):
        if function_name is None:
            functions = ''
            for f_name in self.invariants():
                try:
                    f_text = astunparse.unparse(self.functions_with_invariants_ast(f_name))
                except KeyError:
                    f_text = ''
                functions += f_text
            return functions

        return astunparse.unparse(self.functions_with_invariants_ast(function_name))
    
    def function_with_invariants(self, function_name):
        return self.functions_with_invariants(function_name)
    def function_with_invariants_ast(self, function_name):
        return self.functions_with_invariants_ast(function_name)

In [None]:
def annotate_invariants(invariants):
    annotated_functions = {}
    
    for function_name in invariants:
        try:
            annotated_functions[function_name] = annotate_function_with_invariants(function_name, invariants[function_name])
        except KeyError:
            continue

    return annotated_functions

In [None]:
def annotate_function_with_invariants(function_name, function_invariants):
    function = globals()[function_name]
    function_code = inspect.getsource(function)
    function_ast = ast.parse(function_code)
    return annotate_function_ast_with_invariants(function_ast, function_invariants)

In [None]:
def annotate_function_ast_with_invariants(function_ast, function_invariants):
    annotated_function_ast = EmbeddedInvariantTransformer(function_invariants).visit(function_ast)
    return annotated_function_ast

#### Preconditions

In [None]:
class PreconditionTransformer(ast.NodeTransformer):
    def __init__(self, invariants):
        self.invariants = invariants
        super().__init__()
        
    def preconditions(self):
        preconditions = []
        for (prop, var_names) in self.invariants:
            assertion = "assert " + instantiate_prop(prop, var_names) + ', "violated precondition"'
            assertion_ast = ast.parse(assertion)

            if assertion.find('return_value') < 0:
                preconditions += assertion_ast.body

        return preconditions
    
    def insert_assertions(self, body):
        preconditions = self.preconditions()
        try:
            docstring = body[0].value.s
        except:
            docstring = None
            
        if docstring:
            return [body[0]] + preconditions + body[1:]
        else:
            return preconditions + body

    def visit_FunctionDef(self, node):
        """Add invariants to function"""
        # print(ast.dump(node))
        node.body = self.insert_assertions(node.body)
        return node    

In [None]:
class EmbeddedInvariantTransformer(PreconditionTransformer):
    pass

In [None]:
with EmbeddedInvariantAnnotator() as annotator:
    my_sqrt(5)

In [None]:
print_content(annotator.functions_with_invariants(), '.py')

In [None]:
with EmbeddedInvariantAnnotator() as annotator:
    y = sum3(3, 4, 5)
    y = sum3(-3, -4, -5)
    y = sum3(0, 0, 0)

In [None]:
print_content(annotator.functions_with_invariants(), '.py')

#### Postconditions

We make a few simplifying assumptions: 

* Variables do not change during execution.
* There is a single `return` statement at the end of the function.

In [None]:
class EmbeddedInvariantTransformer(PreconditionTransformer):
    def postconditions(self):
        postconditions = []

        for (prop, var_names) in self.invariants:
            assertion = "assert " + instantiate_prop(prop, var_names) + ', "violated postcondition"'
            assertion_ast = ast.parse(assertion)

            if assertion.find('return_value') >= 0:
                postconditions += assertion_ast.body

        return postconditions
    
    def insert_assertions(self, body):
        new_body = super().insert_assertions(body)
        postconditions = self.postconditions()

        body_ends_with_return = isinstance(new_body[-1], ast.Return)
        if body_ends_with_return:
            saver = "return_value = " + astunparse.unparse(new_body[-1].value)
        else:
            saver = "return_value = None"
    
        saver_ast = ast.parse(saver)
        postconditions = [saver_ast] + postconditions

        if body_ends_with_return:
            return new_body[:-1] + postconditions + [new_body[-1]]
        else:
            return new_body + postconditions

In [None]:
with EmbeddedInvariantAnnotator() as annotator:
    my_sqrt(5)

In [None]:
my_sqrt_def = annotator.functions_with_invariants()

In [None]:
print_content(my_sqrt_def, '.py')

In [None]:
exec(my_sqrt_def.replace('my_sqrt', 'my_sqrt_annotated'))

In [None]:
with ExpectError():
    my_sqrt_annotated(-1)

In [None]:
with EmbeddedInvariantAnnotator() as annotator:
    y = sum3(3, 4, 5)
    y = sum3(-3, -4, -5)
    y = sum3(0, 0, 0)

In [None]:
print_content(annotator.functions_with_invariants(), '.py')

In [None]:
def list_length(L):
    if L == []:
        length = 0
    else:
        length = 1 + list_length(L[1:])
    return length

In [None]:
with EmbeddedInvariantAnnotator() as annotator:
    length = list_length([1, 2, 3])

print_content(annotator.functions_with_invariants(), '.py')

In [None]:
def print_sum(a, b):
    print(a + b)

In [None]:
with EmbeddedInvariantAnnotator() as annotator:
    print_sum(31, 45)

In [None]:
print_content(annotator.functions_with_invariants(), '.py')