# Symbolic Fuzzing

One of the problems with traditional methods of fuzzing is that they fail to penetrate deeply into the program. Quite often the execution of a specific branch of execution may happen only with very specific inputs, which may represent an extremely small fraction of the input space. The traditional fuzzing methods relies on chance to produce inputs they need. However, relying on randomness to generate values that we want is a bad idea when the space to be explored is large. For example, given a function that accepts a string, even if one only considers the first $10$ characters, already has $2^{80}$ possible inputs. If one is looking for a specific string, random generation of values will take a few thousand years even in one of the super computers.

Symbolic execution is a way out of this problem. A program is a computation that can be treated as a system of equations that obtains the output values from the given inputs. Executing the program symbolically -- that is, solving these mathematically -- along with any specified objective such as covering a particular branch or obtaining a particular output will get us inputs that can accomplish this task. Unfortunately, symbolic execution can rapidly become unwieldy as the paths through the program increases. A practical alternative is called *Concolic* execution, which combines symbolic and concrete execution, with concrete execution guiding symbolic execution through a path through the program.

In this chapter, we investigate how **concolic execution** can be implemented, and how it can be used to obtain interesting values for fuzzing.

**Prerequisites**

* You should have read the [chapter on coverage](Coverage.ipynb).
* Some knowledge of inheritance in Python is required.
* A familiarity with the [chapter on search based fuzzing](SearchBasedFuzzer.ipynb) would be useful.
* A familiarity with the basic idea of [SMT solvers](https://en.wikipedia.org/wiki/Satisfiability_modulo_theories) would be useful.

## Using Symbolic Variables for Coverage

In the chapter on [parsing and recombining inputs](SearchBasedFuzzer.ipynb), we saw how difficult it was to generate inputs for `process_vehicle()` -- a simple function that accepts a string. The solution given there was to rely on preexisting sample inputs. However, this solution is inadequate as it assumes the existence of sample inputs. What if there are sample inputs at hand?

For a simpler example, let us consider the following function. Can we generate inputs to cover all the paths?

In [None]:
def check_triangle(a,b,c):
    if a == b:
        if a == c:
            if b == c:
                return "Equilateral"
            else:
                return "Isosceles"
        else:
            return "Isosceles"
    else:
        if b != c:
            if a == c:
                return "Isosceles"
            else:
                return "Scalene"
        else:
              return "Isosceles"

The control flow graph of this function can be represented as follows:

In [None]:
import fuzzingbook_utils

In [None]:
from ControlFlow import PyCFG, CFGNode, to_graph, gen_cfg

In [None]:
import inspect

In [None]:
from graphviz import Source, Graph

In [None]:
Source(to_graph(gen_cfg(inspect.getsource(check_triangle))))

The possible execution paths traced by the program can be represented as follows, with the numbers indicating the specific line numbers executed.

```python
<path 1> [1, 2, 3, 4, 5, Equilateral]
<path 2> [1, 2, 3, 4, 7, Isosceles]
<path 3> [1, 2, 3, 9, Isosceles]
<path 4> [1, 2, 11, 12, 13, Isosceles]
<path 5> [1, 2, 11, 12, 15, Scalene]
<path 6> [1, 2, 11, 17, Isosceles]
```

Consider the `<path 1>`. If we want to trace this path, we need to execute the following statements in order.

```python
1: check_triangle(a, b, c)
2: if (a == b) -> True
3: if (a == c) -> True
4: if (b == c) -> True
5: return 'Equilateral'
```

That is, any execution that traces this path has to start with values for `a`, `b`, and `c` that obeys the constraints in line numbers `2: (a == b)` evaluates to `True`, `3: (a == c)` evaluates to `True`, and `4: (b == c)` evaluates to `True`. Can we generate inputs such that these constraints are satisfied?

One of the ways to solve such constraints is to use an [SMT solver](https://en.wikipedia.org/wiki/Satisfiability_modulo_theories) such as [z3](http://theory.stanford.edu/~nikolaj/programmingz3.html). Here is how one would go about solving the set of equations using *z3*.

In [None]:
import z3

First, we declare a set of variables as symbolic integers using *z3*.

In [None]:
a, b, c = z3.Int('a'), z3.Int('b'), z3.Int('c')

We can now ask *z3* to solve the set of equations for us as follows.

In [None]:
z3.solve(a == b, a == c, b == c)

Indeed, we find the first problem in our program. Our program seems to not check whether the sides are natural numbers. Assume for now that we do not have that restriction. Does our program correctly follow the path described?

In [None]:
from Coverage import Coverage

In [None]:
with Coverage() as cov:
    assert check_triangle(0, 0, 0) == 'Equilateral'

To plot the path taken, we need to extract edges from the coverage.
We define a procedure `cov_to_arcs()` to translate our coverage to a list of edges.

In [None]:
def cov_to_arcs(cov):
    arcs = []
    last = None
    for fn, ln in cov._trace:
        if last is not None:
            arcs.append((last, ln))
        last = ln
    return arcs

In [None]:
cov_to_arcs(cov)

We can now determine the path taken.

In [None]:
check_triangle_src = inspect.getsource(check_triangle).strip()

In [None]:
Source(to_graph(gen_cfg(check_triangle_src), arcs=cov_to_arcs(cov)))

The path taken is indeed `<path 1>`.

Similarly, for solving `<path 2>` we need to simply invert the condition at <line 2>:

In [None]:
a, b, c = z3.Ints('a b c')

In [None]:
z3.solve(a == b, a == c, z3.Not(b == c))

The symbolic execution suggests that there is no solution. A moment's reflection will convince us that it is indeed true. Let us proceed with the other paths. The `<path 3>` can be obtained by inverting the condition at `<line 4>`.

In [None]:
a, b, c = z3.Ints('a b c')

In [None]:
z3.solve(a == b, z3.Not(a==c))

In [None]:
with Coverage() as cov:
    assert check_triangle(1, 1, 0) == 'Isosceles'
[i for fn, i in cov._trace if fn == 'check_triangle']

How about path <4>?

In [None]:
a, b, c = z3.Ints('a b c')

In [None]:
z3.solve(z3.Not(a == b), b!= c, a == c)

As we mentioned earlier, our program does not account for sides with zero or negative length. We can modify our program to check for zero and negative input. However, do we always have to make sure that every function has to account for all possible inputs? It is possible that the `check_triangle` is not directly exposed to the user, and it is called from another function that already guarantees that the inputs would be positive.

We can easily add such a precondition here.

In [None]:
pre_condition = z3.And(a > 0, b > 0, c > 0)

In [None]:
z3.solve(pre_condition, z3.Not(a == b), b!= c, a == c)

In [None]:
with Coverage() as cov:
    assert check_triangle(1, 2, 1) == 'Isosceles'
[i for fn, i in cov._trace if fn == 'check_triangle']

Continuing to path <5>:

In [None]:
a, b, c = z3.Ints('a b c')

In [None]:
z3.solve(pre_condition,z3.Not(a == b), b!= c, z3.Not(a == c))

And indeed it is a *Scalene* triangle.

In [None]:
with Coverage() as cov:
    assert check_triangle(3, 1, 2) == 'Scalene'
[i for fn, i in cov._trace if fn == 'check_triangle']

Finally, for `<path 6>` the procedure is similar.

In [None]:
z3.solve(pre_condition, z3.Not(a == b), z3.Not(b!= c))

In [None]:
with Coverage() as cov:
    assert check_triangle(2, 1, 1) == 'Isosceles'
[i for fn, i in cov._trace if fn == 'check_triangle']

That is, using simple symbolic computation, we were able to easily see that (1) some of the paths are not reachable, and (2) some of the conditions were insufficient -- that is, we needed preconditions. What about the total coverage obtained?

In [None]:
with Coverage() as cov:
    assert check_triangle(0, 0, 0) == 'Equilateral'
    assert check_triangle(1, 1, 0) == 'Isosceles'
    assert check_triangle(1, 2, 1) == 'Isosceles'
    assert check_triangle(3, 1, 2) == 'Scalene'
    assert check_triangle(2, 1, 1) == 'Isosceles'

In [None]:
covered = set([lineno for method,lineno in cov._trace])
for i,s in enumerate(check_triangle_src.split('\n')):
    print('%s %2d: %s' % ('#' if i+1 in covered else ' ', i+1, s))

The coverage is as expected. The generated values does seem to cover all code that can be covered. However, doing this by hand is tedious and error prone. What we need is the ability to extract *all paths* in the program, and symbolically execute each path, which will generate the inputs required to cover all reachable portions of the program.

Doing this is fairly simple for a program such as `check_triangle()` which does not contain loops or reassignments. We first define `get_all_paths()` that, given a starting point, will recursively examine all child nodes, and return the traversed paths.

### Simple Symbolic Fuzzer

We define a simple *symbolic fuzzer* that can generate input values *symbolically* with the following assumptions:

* There are no loops in the program
* The function is self contained.
* No recursion.
* No reassignments for variables.

The key idea is as follows: We traverse through the control flow graph from the entry point, and generate all possible paths to a given depth. Then we collect constraints that we encountered along the path, and generate inputs that will traverse the program up to that point.

In [None]:
from Fuzzer import Fuzzer

We start by extracting the control flow graph of the function passed.

In [None]:
class SimpleSymbolicFuzzer(Fuzzer):
    def __init__(self, fn, **kwargs):
        self.fn = fn
        self.fn_src =  inspect.getsource(fn)
        self.fn_args = list(inspect.signature(fn).parameters)
        self.py_cfg = PyCFG()
        self.py_cfg.gen_cfg(self.fn_src)
        self.fnenter, self.fnexit = self.py_cfg.functions[fn.__name__]
        self.paths = None
        self.last_path = None
        self.removed_solutions = []
        self.z3 = z3.Solver()
        self.options(kwargs)

We define `MAX_DEPTH` as the depth to which one should attempt to trace the execution.

In [None]:
MAX_DEPTH = 100

Since some of the paths may not be satisfied, we define `MAX_TRIES` as the maximum number of attempts we will try to produce a value before giving up.

In [None]:
MAX_TRIES = 100

In [None]:
MAX_ITER = 100

In [None]:
class SimpleSymbolicFuzzer(SimpleSymbolicFuzzer):
    def options(self, kwargs):
        self.max_depth = kwargs.get('max_depth', MAX_DEPTH)
        self.max_tries = kwargs.get('max_tries', MAX_TRIES)
        self.max_iter = kwargs.get('max_iter', MAX_ITER)
        self.symbolic_fn = kwargs.get('symbolic_fn', 'z3.Int')
        self._options = kwargs

The initialization generates a control flow graph and hooks it to `fnenter` and `fnexit`.

In [None]:
symfz_ct = SimpleSymbolicFuzzer(check_triangle)

In [None]:
symfz_ct.fnenter, symfz_ct.fnexit

#### get_all_paths
We can use the `fnenter` to recursively retrieve all paths in the function.

In [None]:
class SimpleSymbolicFuzzer(SimpleSymbolicFuzzer):
    def get_all_paths(self, fenter, depth=0):
        if depth > self.max_depth:
            raise Exception('Maximum depth exceeded')
        if not fenter.children:
            return [[(0, fenter)]]

        fnpaths = []
        for idx, child in enumerate(fenter.children):
            child_paths = self.get_all_paths(child, depth+1)
            for path in child_paths:
                fnpaths.append([(idx, fenter)] + path)
        return fnpaths

In [None]:
symfz_ct = SimpleSymbolicFuzzer(check_triangle)
paths = symfz_ct.get_all_paths(symfz_ct.fnenter)
paths[1]

In [None]:
import ast, astunparse

The function `to_src()` allows us to *unparse* an expression.

In [None]:
def to_src(astnode):
    return astunparse.unparse(astnode).strip()

#### names

We need the names of variables used in an expression to declare the to declare them. The method `names()` extracts variables used.

In [None]:
def names(astnode):
    lst = []
    if isinstance(astnode, ast.BoolOp):
        for i in astnode.values:
            lst.extend(names(i))
    elif isinstance(astnode, ast.BinOp):
        lst.extend(names(astnode.left))
        lst.extend(names(astnode.right))
    elif isinstance(astnode, ast.UnaryOp):
        lst.extend(names(astnode.operand))
    elif isinstance(astnode, ast.Call):
        for i in astnode.args:
            lst.extend(names(i))
    elif isinstance(astnode, ast.Compare):
        lst.extend(names(astnode.left))
        for i in astnode.comparators:
            lst.extend(names(i))
    elif isinstance(astnode, ast.Name):
        lst.append(astnode.id)
    elif isinstance(astnode, ast.Expr):
        lst.extend(names(astnode.value))
    elif isinstance(astnode, (ast.Num, ast.Str, ast.Tuple, ast.NameConstant)):
        pass
    elif isinstance(astnode, ast.Assign):
        for t in astnode.targets:
            lst.extend(names(t))
        lst.extend(names(astnode.value))
    elif isinstance(astnode, ast.Module):
        for b in astnode.body:
            lst.extend(names(b))
    else:
        raise Exception(str(astnode))
    return lst

With this, we can now extract the variables used in an expression.

In [None]:
v = ast.parse('fn(x+z,y>(a+b)) == c')
names(v)

#### extract_constraints

For any given path, we define a function `extract_constraints()` to extract the constraints in `z3` format.

In [None]:
class SimpleSymbolicFuzzer(SimpleSymbolicFuzzer):
    def extract_constraints(self, path):
        last = None
        predicates = []
        my_names = []
        for (idx, elt) in path:
            if last is not None:
                order = {c.i(): i for i,c in enumerate(last.children)}
                if isinstance(last.ast_node, ast.AnnAssign):
                    if last.ast_node.target.id in {'_if'}:
                        s = to_src(last.ast_node.annotation)
                        my_names.extend(names(ast.parse(s)))
                        predicates.append(("%s" if order[elt.i()] == 0 else "z3.Not%s") % s)
                elif isinstance(last.ast_node, ast.Assign):
                    my_names.extend(names(last.ast_node))
                    predicates.append(to_src(last.ast_node))
                else:
                    pass
            last = elt
        return list(set(my_names)), predicates

In [None]:
symfz_ct = SimpleSymbolicFuzzer(check_triangle)
paths = symfz_ct.get_all_paths(symfz_ct.fnenter)
symfz_ct.extract_constraints(paths[1])

#### Fuzz

To actually generate solutions, we need to first extract all paths. Then choose a particular path, and extract the constraints in that path, which is then solved using *z3*.

In [None]:
class SimpleSymbolicFuzzer(SimpleSymbolicFuzzer):
    def fuzz(self):
        init = False
        if self.paths is None:
            self.paths = self.get_all_paths(self.fnenter)
            self.last_path = len(self.paths)
            init = True
        for i in range(1, self.max_tries):
            self.last_path -= 1
            if self.last_path == -1:
                self.last_path = len(self.paths) - 1
                
            # re-initializing does not seem problematic.
            # a = z3.Int('a').get_id() remains the same.
            my_names, constraints = self.extract_constraints(self.paths[self.last_path])
            st = "%s = %ss('%s')" % (', '.join(my_names), self.symbolic_fn, ' '.join(my_names))
            exec(st)

            self.z3.push()
            st = 'self.z3.add(%s)' % ', '.join(constraints)
            eval(st)
            result = {}
            predicate = None
            if self.z3.check() == z3.sat:
                m = self.z3.model()
                result = {d.name(): m[d] for d in m.decls()}
                predicate = 'z3.And(%s)' % ','.join(["%s == %s" % (d.name(), m[d]) for d in m.decls()])
                self.removed_solutions.append(predicate)
            self.z3.pop()
            if predicate:
                st = 'self.z3.add(z3.Not(%s))' % self.removed_solutions[-1]
                eval(st)
            if result:
                return result
        return {}

The fuzzer can be used as follows.

In [None]:
a, b, c = None, None, None
symfz_ct = SimpleSymbolicFuzzer(check_triangle)
for i in range(1,10):
    r = symfz_ct.fuzz()
    v = check_triangle(r['a'].as_long(), r['b'].as_long(), r['c'].as_long())
    print(r, v)

#### Problems with the Simple Fuzzer

As we mentioned earlier, the `SimpleSymbolicFuzzer` cannot yet deal with variable reassignments. Further, it also fails to account for any loops. For example, consider the following program.

In [None]:
def gcd(a, b):
    if a<b:
        c = a
        a = b
        b = c

    while b != 0 :
        c = a
        a = b
        b = c % b
    return a

In [None]:
Source(to_graph(gen_cfg(inspect.getsource(gcd))))

In [None]:
from ExpectError import ExpectError

In [None]:
with ExpectError():
    symfz_gcd = SimpleSymbolicFuzzer(gcd)
    for i in range(1,100):
        r = symfz_gcd.fuzz()
        v = gcd(r['a'].as_long(), r['b'].as_long())
        print(r, v)

### Advanced Symbolic Fuzzer

We next define `AdvancedSymbolicFuzzer` that can deal with reassignments and loops.

In [None]:
class AdvancedSymbolicFuzzer(SimpleSymbolicFuzzer):
    def options(self, kwargs):
        super().options(kwargs)

 First, we import the AST.

In [None]:
import ast
import astunparse

The ast module can be used to extract the AST of an expression.

In [None]:
ast.parse('x == y')

The parsed *AST* contains expressions within an AST module. We define a shorthand `get_expression()` to extract the contents of the module.

In [None]:
def get_expression(src):
    return ast.parse(src).body[0].value

In [None]:
get_expression('x == y')

In [None]:
to_src(get_expression('x == y'))

#### rename_variables

Next, we want to rename all variables present in an expression such that the variables are annotated with their usage count. This makes it possible to determine variable reassignments.

We define the `rename_variables()` function that, when given an `env` that contains the current usage index of different variables, renames the variables in the passed in AST node with the annotations.

That is, if the expression is `env[v] == 1`, `v` is renamed to `_v_1`

In [None]:
def rename_variables(astnode, env):
    if isinstance(astnode, ast.BoolOp):
        fn = 'z3.And' if isinstance(astnode.op, ast.And) else 'z3.Or'
        return ast.Call(
            ast.Name(fn, None),
            [rename_variables(i, env) for i in astnode.values], [])
    elif isinstance(astnode, ast.BinOp):
        return ast.BinOp(
            rename_variables(astnode.left, env), astnode.op,
            rename_variables(astnode.right, env))
    elif isinstance(astnode, ast.UnaryOp):
        if isinstance(astnode.op, ast.Not):
            return ast.Call(
                ast.Name('z3.Not', None),
                [rename_variables(astnode.operand, env)], [])
        else:
            return ast.UnaryOp(astnode.op,
                               rename_variables(astnode.operand, env))
    elif type(astnode) is ast.Call:
        return ast.Call(astnode.func,
                        [rename_variables(i, env) for i in astnode.args],
                        astnode.keywords)
    elif type(astnode) is ast.Compare:
        return ast.Compare(
            rename_variables(astnode.left, env), astnode.ops,
            [rename_variables(i, env) for i in astnode.comparators])
    elif type(astnode) is ast.Name:
        if astnode.id not in env:
            env[astnode.id] = 0
        num = env[astnode.id]
        return ast.Name('_%s_%d' % (astnode.id, num), astnode.ctx)
    elif type(astnode) is ast.Return:
        return ast.Return(rename_variables(astnode.value, env))
    else:
        return astnode

To verify that it works ans intended, we start with an environment.

In [None]:
env = {'x':1}

In [None]:
ba = get_expression('x == 1 and y == 2')
type(ba)

In [None]:
assert to_src(rename_variables(ba, env)) == 'z3.And((_x_1 == 1), (_y_0 == 2))'

In [None]:
bo = get_expression('x == 1 or y == 2')
type(bo.op)

In [None]:
assert to_src(rename_variables(bo, env)) == 'z3.Or((_x_1 == 1), (_y_0 == 2))'

In [None]:
b = get_expression('x + y')
type(b)

In [None]:
assert to_src(rename_variables(b, env)) == '(_x_1 + _y_0)'

In [None]:
u = get_expression('-y')
type(u)

In [None]:
assert to_src(rename_variables(u, env)) == '(- _y_0)'

In [None]:
un = get_expression('not y')
type(un.op)

In [None]:
assert to_src(rename_variables(un, env)) == 'z3.Not(_y_0)'

In [None]:
c = get_expression('x == y')
type(c)

In [None]:
assert to_src(rename_variables(c, env)) == '(_x_1 == _y_0)'

In [None]:
f = get_expression('fn(x,y)')
type(f)

In [None]:
assert to_src(rename_variables(f, env)) == 'fn(_x_1, _y_0)'

In [None]:
env

Next, we want to process the CFG, and correctly transform the paths.

#### PNode

For keeping track of assignments in the CFG, We define a data structure `PNode` that stores the current CFG node.

In [None]:
class PNode:
    def __init__(self, idx, cfgnode, parent=None, order=0):
        self.idx, self.cfgnode, self.parent, self.order = idx, cfgnode, parent, order

    def __repr__(self):
        return "PNode:%d[%s order:%d]" % (self.idx, str(self.cfgnode),
                                          self.order)

Defining a new `PNode` is done as follows.

In [None]:
cfg = PyCFG()
cfg.gen_cfg(inspect.getsource(gcd))
fnenter, fnexit = cfg.functions['gcd']

In [None]:
PNode(0, fnenter)

##### copy
The `copy()` method generates a copy for the child's keep, indicating which path was taken (with `order` of the child).

In [None]:
class PNode(PNode):
    def copy(self, order):
        return PNode(self.idx, self.cfgnode, self.parent, order)

Using the copy operation.

In [None]:
PNode(0, fnenter).copy(1)

##### explore

A problem we had with our `SimpleSymbolicFuzzer` is that it explored a path to completion before attempting another. However, this is non-optimal. One may want to explore the graph in a more step-wise manner, expanding every possible execution one step at a time.

Hence, we define `explore()`  which explores the children of a node if any, one step at a time. If done exhaustively, this will generate all paths from a starting node until no more children are left. We made `PNode` to a container class so that this iteration can be driven from outside, and stopped if say a maximum iteration is complete, or certain paths need to be prioritized.

In [None]:
class PNode(PNode):
    def explore(self):
        return [
            PNode(self.idx + 1, n, self.copy(i))
            for (i, n) in enumerate(self.cfgnode.children)
        ]

We can use `explore()` as follows.

In [None]:
PNode(0, fnenter).explore()

In [None]:
PNode(0, fnenter).explore()[0].explore()

##### get_path_to_root

The method `get_path_to_root()` recursively goes up through child->parent chain retrieving the complete chain to the topmost parent.

In [None]:
class PNode(PNode):
    def get_path_to_root(self):
        path = []
        n = self
        while n:
            path.append(n)
            n = n.parent
        return list(reversed(path))

In [None]:
p = PNode(0, fnenter)
[s.get_path_to_root() for s in p.explore()[0].explore()[0].explore()[0].explore()]

The string representation of the node is in `z3` solvable form.

In [None]:
class PNode(PNode):
    def __str__(self):
        path = self.get_path_to_root()
        #print([p.cfgnode.lineno() for p in path])
        ssa_path = to_single_assignment(path)
        return ', '.join([to_src(p) for p in ssa_path])

However, before using it, we need to define the `rename_variables()`. But first, we define `names()`.

###### to_single_assignment

We need to rename used variables. Any variable `v = xxx` should be renamed to `_v_0` and any later assignment such as `v = v + 1` should be transformed to `_v_1 = _v_0 + 1` and later conditionals such as `v == x` should be transformed to `(_v_1 == _x_0)`. The method `to_single_assignment()` does this for a given path.

In [None]:
def to_single_assignment(path):
    env = {}
    my_vars = set()
    new_path = []
    for node in path:
        ast_node = node.cfgnode.ast_node
        new_node = None
        if isinstance(ast_node, ast.AnnAssign) and ast_node.target.id in {'exit'}:
            new_path.append(None)
        elif isinstance(ast_node, ast.AnnAssign) and ast_node.target.id in {'enter'}:
            args = [ast.parse("%s == _%s_0" %(a.id, a.id)).body[0].value for a in ast_node.annotation.args]
            new_node = ast.Call(ast.Name('z3.And', None), args, [])
            new_path.append(new_node)
        elif isinstance(ast_node, ast.AnnAssign) and ast_node.target.id in {'_if', '_while'}:
            new_node = rename_variables(ast_node.annotation, env)
            if node.order != 0:
                assert node.order == 1
                new_node = ast.Call(ast.Name('z3.Not', None), [new_node], [])
            new_path.append(new_node)
        elif isinstance(ast_node, ast.Assign):
            assigned = ast_node.targets[0].id
            val = [rename_variables(ast_node.value, env)]
            if assigned not in env:
                env[assigned] = 0
            else:
                env[assigned] += 1
            target = ast.Name('_%s_%d' % (ast_node.targets[0].id, env[assigned]), None)
            new_node = ast.Expr(ast.Compare(target, [ast.Eq()], val))
            new_path.append(new_node)
        elif isinstance(ast_node, ast.Return):
            new_path.append(None)
        elif isinstance(ast_node, ast.Pass):
            new_path.append(None)
        else:
            s = "NI %s %s" %(type(ast_node), ast_node.target.id)
            raise Exception(s)
    return list(zip(path, new_path))

In [None]:
p = PNode(0, fnenter)
path = p.explore()[0].explore()[0].explore()[0].get_path_to_root()
pathpair = to_single_assignment(path)
spath = [(p,s) for p, s in pathpair if s is not None]

In [None]:
[to_src(s) for p,s in spath]

In [None]:
assert set(q for p,s in spath for q in names(s)) == {'_a_0', '_a_1', '_b_0', '_c_0', 'a', 'b'}

##### can_be_satisfied

One of the ways in which the *concolic* execution simplifies *symbolic* execution is in the treatment of loops. Rather than trying to determine an invariant for a loop, we simply *unroll* the loops a number of times until we hit the `MAX_DEPTH` limit. However, not all loops will need to be unrolled until `MAX_DEPTH` is reached. Some of them may exit before. Hence, it is necessary to check whether the given set of constraints can be satisfied before continuing to explore further. 

In [None]:
class AdvancedSymbolicFuzzer(AdvancedSymbolicFuzzer):
    def can_be_satisfied(self, p):
        defs, s2 = self.extract_constraints(p.get_path_to_root())
        s = z3.Solver()
        exec(defs, globals(), locals())
        exec("s.add(%s)" % s2, globals(), locals())
        return s.check() == z3.sat

#### extract_constraints

The `extract_constraints()` generates the `z3` constraints from a path.

In [None]:
class AdvancedSymbolicFuzzer(AdvancedSymbolicFuzzer):
    def extract_constraints(self, path):
        spath_pair_ = to_single_assignment(path)
        spath = [(p,s) for p, s in spath_pair_ if s is not None]
        my_names = [q for p,s in spath for q in names(s)]
        defs = "%s = %ss('%s')" % (', '.join(my_names), self.symbolic_fn, ' '.join(my_names))
        predicates = []
        last = None
        for p_node,ast_node in spath:
            cfgnode = p_node.cfgnode
            if last is not None:
                order = {c.i(): i for i,c in enumerate(last[0].children)}
                if isinstance(last[0].ast_node, ast.AnnAssign):
                    if last[0].ast_node.target.id in {'_if', '_while'}:
                        s = to_src(last[1])
                        my_names.extend(names(last[1]))
                        predicates.append(s)
                    elif last[0].ast_node.target.id in {'enter'}:
                        s = to_src(last[1])
                        my_names.extend(names(last[1]))
                        predicates.append(s)
                    else:
                        print(last[0].ast_node.target.id)
                elif isinstance(last[0].ast_node, ast.Assign):
                    my_names.extend(names(last[1]))
                    predicates.append(to_src(last[1]))
                else:
                    print(last[0].ast_node)
                    pass
            last = (cfgnode, ast_node)
        umy_names = list(set(my_names))
        defs = "%s = %ss('%s')" % (', '.join(umy_names), self.symbolic_fn, ' '.join(umy_names))
        return defs, ','.join(predicates)

#### get_all_paths

Get all paths one can generate from function enter node (`fenter`) subject to max_depth limit.

In [None]:
class AdvancedSymbolicFuzzer(AdvancedSymbolicFuzzer):
    def get_all_paths(self, fenter):
        path_lst = [PNode(0, fenter)]
        completed = []
        for i in range(self.max_iter):
            new_paths = [PNode(0, fenter)]
            #new_paths = []
            for path in path_lst:
                # explore each path once
                if path.cfgnode.children:
                    np = path.explore()
                    for p in np:
                        if path.idx > self.max_depth:
                            break
                        if self.can_be_satisfied(p):
                            new_paths.append(p)
                        else:
                            pass
                else:
                    completed.append(path)
            path_lst = new_paths
        return completed + path_lst

In [None]:
asymfz_gcd = AdvancedSymbolicFuzzer(gcd, max_iter=10, max_tries=10, max_depth=10)
paths = asymfz_gcd.get_all_paths(asymfz_gcd.fnenter)
print(len(paths))
paths[37].get_path_to_root()

In [None]:
for p,s in to_single_assignment(paths[37].get_path_to_root()):
    if s is not None:
        print(to_src(s))

In [None]:
my_names, constraints = asymfz_gcd.extract_constraints(paths[37].get_path_to_root())

In [None]:
constraints

In [None]:
my_names

In [None]:
class AdvancedSymbolicFuzzer(AdvancedSymbolicFuzzer):
    def fuzz(self):
        def to_original(k):
            if not k.startswith('_'):
                return k
            else:
                assert False
        if self.paths is None:
            self.paths = self.get_all_paths(self.fnenter)
            self.last_path = len(self.paths)
            assert self.last_path > 0
        for i in range(self.max_tries):
            self.last_path -= 1
            if self.last_path == -1:
                self.last_path = len(self.paths) - 1
                assert self.last_path > 0
            to_root = self.paths[self.last_path].get_path_to_root()
            my_names, constraints = self.extract_constraints(to_root)
            exec(my_names)
            
            cons = "self.z3.add(%s)" % constraints
            self.z3.push()
            eval(cons)
            result = {}
            if self.z3.check() == z3.sat:
                m = self.z3.model()
                result = {d.name(): m[d] for d in m.decls() if d.name() in self.fn_args}
                if len(result) != len(self.fn_args):
                    print(self.z3, "model:", m, 'last:', cons, 'to_root:', to_root )
                    result = {r:result[r] if r in result else None for r in self.fn_args}
                predicate = 'z3.And(%s)' % ','.join(["%s == %s" % (k,v) for k,v in result.items()])
                self.removed_solutions.append(predicate)
            self.z3.pop()
            st = 'self.z3.add(z3.Not(%s))' % self.removed_solutions[-1]
            eval(st)
            if result:
                return result
        return {}

In [None]:
asymfz_gcd = AdvancedSymbolicFuzzer(gcd, max_tries=10, max_iter=10, max_depth=10)
data = []
for i in range(10):
    r = asymfz_gcd.fuzz()
    print(r)
    data.append((r['a'].as_long(), r['b'].as_long()))
    v = gcd(*data[-1])
    print(">", repr(v))

In [None]:
with Coverage() as cov:
    for a,b in data:
        gcd(a,b)

In [None]:
covered = set([lineno for method,lineno in cov._trace])
source = inspect.getsource(gcd).strip().split('\n')
for i,s in enumerate(source):
    print('%s %2d: %s' % ('#' if i+1 in covered else ' ', i+1, s))

In [None]:
Source(to_graph(gen_cfg(inspect.getsource(gcd)), arcs=cov_to_arcs(cov)))

#### Example: roots
Let us explore our new symbolic fuzzer a little more. Here is the famous equation for finding the roots of quadratic equations.

In [None]:
def roots(a, b, c):
    d = b*b - 4 * a * c
    ax = 0.5 * d
    bx = 0
    while (ax - bx) > 0.1:
        bx = 0.5 * (ax + d/ax)
        ax = bx
    s = bx
    
    a2 = 2*a
    ba2 = b/a2
    return -ba2 + s/a2, -ba2 - s/a2

In [None]:
def sym_to_float(v):
    if v is None:
        return math.inf
    return v.numerator_as_long() / v.denominator_as_long()

Let us investigate.

In [None]:
asymfz_roots = AdvancedSymbolicFuzzer(roots, max_tries=10, max_iter=10, max_depth=10, symbolic_fn='z3.Real')
with ExpectError():
    for i in range(100):
        r = asymfz_roots.fuzz()
        d = [sym_to_float(r[i]) for i in ['a', 'b', 'c']]
        v = roots(*d)
        print(d, v)

#####  roots - take 2

In [None]:
def roots(a, b, c):
    d = b*b - 4 * a * c
    
    xa = 0.5 * d
    xb = 0
    while (xa - xb) > 0.1:
        xb = 0.5 * (xa + d/xa)
        xa = xb
    s = xb
    
    if a == 0:
        return -c/b

    a2 = 2*a
    ba2 = b/a2
    return -ba2 + s/a2, -ba2 - s/a2

In [None]:
asymfz_roots = AdvancedSymbolicFuzzer(roots, max_tries=10, max_iter=10, max_depth=10, symbolic_fn='z3.Real')
with ExpectError():
    for i in range(100):
        r = asymfz_roots.fuzz()
        d = [sym_to_float(r[i]) for i in ['a', 'b', 'c']]
        v = roots(*d)
        print(d, v)

#####  roots - take 3

In [None]:
import math

In [None]:
def roots(a, b, c):
    d = b*b - 4 * a * c
    
    xa = 0.5 * d
    xb = 0
    while (xa - xb) > 0.1:
        xb = 0.5 * (xa + d/xa)
        xa = xb
    s = xb
    
    if a == 0:
        if b == 0:
            return math.inf
        return -c/b

    a2 = 2*a
    ba2 = b/a2
    return -ba2 + s/a2, -ba2 - s/a2

In [None]:
asymfz_roots = AdvancedSymbolicFuzzer(roots, max_tries=10, max_iter=10, max_depth=10, symbolic_fn='z3.Real')
#with ExpectError():
for i in range(100):
        r = asymfz_roots.fuzz()
        print(r)
        d = [sym_to_float(r[i]) for i in ['a', 'b', 'c']]
        v = roots(*d)
        print(d, v)

Why are we not able to detect the problem of negative roots? Because we stop execution at a pre-determined depth without throwing an error.

Our symbolic fuzzer is reasonable for single functions that use `Int` or `Real` values. However, real world applications often contain multiple recursive method calls, which will not be handled by our implementation. Nor are real applications restricted to using just numbers. Further, when adding functions, there are functions that we cannot adequately solve symbolically (for e.g. a hash function). One of the ways to handle such functions is to go for *Dynamic Symbolic Execution*. The DSE can work around such calls by replacing the concrete values with seeded symbolic values when necessary.

We will examine an implementation that can handle practical programs next.

## Concolic Execution with PyExZ3

[PyExZ3](https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/dse.pdf) is a concolic evaluator of programs that takes a different strategy from what we did here. Similar to our dynamic taint approach, the PyExZ3 wraps the Python data structures so that they are symbolic equivalents. These data structures are then traced through program execution, and constraints are collected at the end.

In [None]:
import PyExZ3.pyloader

In [None]:
import symbolic.symbolic_types as st

In [None]:
generatedInputs, returnVals, path = PyExZ3.pyloader.exploreFunction(check_triangle)

In [None]:
Source(path.toDot())

In [None]:
generatedInputs, returnVals, path = PyExZ3.pyloader.exploreFunction(gcd, max_iters=5)

In [None]:
Source(path.toDot())

In [None]:
def factorial(n):
    if n <= 1:
        return 1
    return n * factorial(n-1)

In [None]:
generatedInputs, returnVals, path = PyExZ3.pyloader.exploreFunction(factorial, max_iters=10)

In [None]:
Source(path.toDot())

In [None]:
import math

In [None]:
def discriminant(a, b, c):
    return b*b - 4 * a * c

def roots(a, b, c):
    if a == 0:
        if b == 0:
            return math.inf
        return -c/b
    d = discriminant(a, b, c)
    a2 = 2*a
    ba2 = b/a2
    if d == 0:
        return -ba2
    s = math.sqrt(d)
    return -ba2 + s/a2, -ba2 - s/a2
    #elif d < 0:
    #    s = math.sqrt(-d)
    #    return (-ba2, s/a2), (-ba2, -s/a2)
    #else:

In [None]:
generatedInputs, returnVals, path = PyExZ3.pyloader.exploreFunction(roots)

In [None]:
Source(path.toDot())

In [None]:
with ExpectError():
    generatedInputs, returnVals, path = PyExZ3.pyloader.exploreFunction(math.sqrt)

The problem here is that *PyExZ3* does not know about *math.sqrt*. Can we help it?

In [None]:
def sqrt(d):
    if d < 0:
        assert False
    xa = 0.5 * d
    xb = 0
    while (xa - xb) > 0.1:
        xb = 0.5 * (xa + d/xa)
        xa = xb
    s = xb
    return s
    
def discriminant(a, b, c):
    return b*b - 4 * a * c

def roots(a, b, c):
    if a == 0:
        if b == 0:
            return math.inf
        return -c/b
    d = discriminant(a, b, c)
    a2 = 2*a
    ba2 = b/a2
    if d == 0:
        return -ba2
    s = sqrt(d)
    return -ba2 + s/a2, -ba2 - s/a2

In [None]:
with ExpectError():
    generatedInputs, returnVals, path = PyExZ3.pyloader.exploreFunction(roots, max_iters=1000)

In [None]:
def sqrt(d):
    if d < 0:
        assert False
    xa = 0.5 * d
    xb = 0
    while (xa - xb) > 0.1:
        xb = 0.5 * (xa + d/xa)
        xa = xb
    s = xb
    return s
    
def discriminant(a, b, c):
    return b*b - 4 * a * c

def roots(a, b, c):
    if a == 0:
        if b == 0:
            return math.inf
        return -c/b
    d = discriminant(a, b, c)
    a2 = 2*a
    ba2 = b/a2
    if d == 0:
        return -ba2
    elif d < 0:
        s = sqrt(-d)
        return (-ba2, s/a2), (-ba2, -s/a2)
    else:
        s = sqrt(d)
        return -ba2 + s/a2, -ba2 - s/a2

In [None]:
generatedInputs, returnVals, path = PyExZ3.pyloader.exploreFunction(roots, max_iters=1000)

## Lessons Learned

* One can use symbolic execution to augment the inputs that explore all characteristics of a program.

## Next Steps

_Link to subsequent chapters (notebooks) here:_

## Background

\cite{KLEE}

## Exercises

_Close the chapter with a few exercises such that people have things to do.  To make the solutions hidden (to be revealed by the user), have them start with_

```markdown
**Solution.**
```

_Your solution can then extend up to the next title (i.e., any markdown cell starting with `#`)._

_Running `make metadata` will automatically add metadata to the cells such that the cells will be hidden by default, and can be uncovered by the user.  The button will be introduced above the solution._

### Exercise 1: _Title_

_Text of the exercise_

In [None]:
# Some code that is part of the exercise
pass

_Some more text for the exercise_

**Solution.** _Some text for the solution_

In [None]:
# Some code for the solution
2 + 2

_Some more text for the solution_

### Exercise 2: _Title_

_Text of the exercise_

**Solution.** _Solution for the exercise_