# Symbolic Fuzzing

One of the problems with traditional methods of fuzzing is that they fail to penetrate deeply into the program. Quite often the execution of a specific branch of execution may happen only with very specific inputs, which may represent an extremely small fraction of the input space. The traditional fuzzing methods relies on chance to produce inputs they need. However, relying on randomness to generate values that we want is a bad idea when the space to be explored is large. For example, given a function that accepts a string, even if one only considers the first $10$ characters, already has $2^{80}$ possible inputs. If one is looking for a specific string, random generation of values will take a few thousand years even in one of the super computers.

Symbolic execution is a way out of this problem. A program is a computation that can be treated as a system of equations that obtains the output values from the given inputs. Executing the program symbolically -- that is, solving these mathematically -- along with any specified objective such as covering a particular branch or obtaining a particular output will get us inputs that can accomplish this task. In this chapter, we investigate how _symbolic execution_ can be implemented, and how it can be used to obtain interesting values for fuzzing.

**Prerequisites**

* You should have read the [chapter on coverage](Coverage.ipynb).
* Some knowledge of inheritance in Python is required.
* A familiarity with the [chapter on search based fuzzing](SearchBasedFuzzer.ipynb) would be useful.

## Using Symbolic Variables for Coverage

In the chapter on [parsing and recombining inputs](SearchBasedFuzzer.ipynb), we saw how difficult it was to generate inputs for `process_vehicle()` -- a simple function that accepts a string. The solution given there was to rely on preexisting sample inputs. However, this solution is inadequate as it assumes the existence of sample inputs. What if there are sample inputs at hand?

For a simpler example, let us consider the following function. Can we generate inputs to cover all the paths?

In [None]:
def check_triangle(a,b,c):
    if a == b:
        if a == c:
            if b == c:
                return "Equilateral"
            else:
                return "Isosceles"
        else:
            return "Isosceles"
    else:
        if b != c:
            if a == c:
                return "Isosceles"
            else:
                return "Scalene"
        else:
              return "Isosceles"

The control flow graph of this function can be represented as follows:

In [None]:
from graphviz import Source, Graph

In [None]:
import fuzzingbook_utils

In [None]:
from ControlFlow import PyCFG, CFGNode, to_graph, gen_cfg

In [None]:
import inspect

In [None]:
graph = to_graph(gen_cfg(inspect.getsource(check_triangle)))

In [None]:
Source(graph.to_string())

The possible execution paths traced by the program can be represented as follows.

The function takes three parameters, and the possible execution paths are the following.

```python
1: [1, 2, 3, 4, 5, Equilateral]
2: [1, 2, 3, 4, 7, Isosceles]
3: [1, 2, 3, 9, Isosceles]
4: [1, 2, 11, 12, 13, Isosceles]
5: [1, 2, 11, 12, 15, Scalene]
6: [1, 2, 11, 17, Isosceles]
```

If we want to cover the path <1>, we need to solve the following constraints.

In [None]:
import z3

In [None]:
a, b, c = z3.Ints('a b c')

In [None]:
z3.solve(a == b, a == c, b == c)

In [None]:
assert check_triangle(0, 0, 0) == 'Equilateral'

Similarly, for solving path <2> we need to simply invert the condition (2):

In [None]:
a, b, c = z3.Ints('a b c')

In [None]:
z3.solve(a == b, a == c, z3.Not(b == c))

The symbolic execution suggests that there is no solution. A moment's reflection will convince us that it is indeed true. Let us proceed with the other paths.

Next we attempt path <3> which we get by inverting (4)

In [None]:
a, b, c = z3.Ints('a b c')

In [None]:
z3.solve(a == b, z3.Not(a==c))

In [None]:
assert check_triangle(1, 1, 0) == 'Isosceles'

How about path <4>?

In [None]:
a, b, c = z3.Ints('a b c')

In [None]:
z3.solve(z3.Not(a == b), b!= c, a == c)

In [None]:
assert check_triangle(1, 0, 1) == 'Isosceles'

Continuing to path <5>:

In [None]:
a, b, c = z3.Ints('a b c')

In [None]:
z3.solve(z3.Not(a == b), b!= c, z3.Not(a == c))

This is surprising! We get negative numbers, because while we *know* that a triangle's sides should not have negative numbers, it was not included in the code. We can explore how it would look if that restriction was added.

In [None]:
z3.solve(a >=0, b>=0, c>= 0,z3.Not(a == b), b!= c, z3.Not(a == c))

And indeed it is a *Scalene* triangle.

In [None]:
assert check_triangle(1, 0, 2) == 'Scalene'

For path <6> the procedure is smimilar.

In [None]:
z3.solve(a >=0, b>=0, c>= 0,z3.Not(a == b), z3.Not(b!= c))

In [None]:
assert check_triangle(0, 1, 1) == 'Isosceles'

That is, using simple symbolic computation, we were able to easily see that (1) some of the paths are not reachable, and (2) some of the conditions were insufficient. What about coverage?

In [None]:
from Coverage import Coverage

In [None]:
with Coverage() as cov:
    assert check_triangle(0, 0, 0) == 'Equilateral'
    assert check_triangle(1, 1, 0) == 'Isosceles'
    assert check_triangle(1, 0, 1) == 'Isosceles'
    assert check_triangle(1, 0, 2) == 'Scalene'
    assert check_triangle(0, 1, 1) == 'Isosceles'

In [None]:
source = inspect.getsource(check_triangle).strip().split('\n')

In [None]:
covered = set([lineno for method,lineno in cov._trace])
for i,s in enumerate(source):
    print('%s %2d: %s' % ('#' if i+1 in covered else ' ', i+1, s))

The coverage is as expected. The generated values does seem to cover all code that can be covered. However, doing this by hand is tedious and error prone. What we need is the ability to extract *all paths* in the program, and symbolically execute each path, which will generate the inputs required to cover all reachable portions of the program.

Indeed, doing this is fairly simple for a simple program such as `check_triangle()`. We first define `get_all_paths()` that, given a starting point, will recursively examine all child nodes, and return the traversed paths.

In [None]:
def get_all_paths(fenter):
    if not fenter.children:
        yield [(0, fenter)]
        
    for idx, child in enumerate(fenter.children):
        for path in get_all_paths(child):
            yield [(idx, fenter)] + path

In [None]:
def show_path(path):
    last = None
    for (idx, elt) in path:
        if last is not None:
            j = last.to_json()
            t = last.child_node_annotations.get('T')
            f = last.child_node_annotations.get('F')
            if elt.rid == t:
                print(j['at'], j['ast'], 'T')
            elif elt.rid == f:
                print(j['at'], j['ast'], 'F')
            else:
                print(j['at'], j['ast'], '')
        last = elt
    print(last.to_json()['ast'], '')

In [None]:
cfg = PyCFG()
cfg.gen_cfg(inspect.getsource(check_triangle))
fnenter, fnexit = cfg.functions['check_triangle']

for path in get_all_paths(fnenter):
    show_path(path)
    print()

## Symbolic Execution

In [None]:
import PyExZ3.pyloader

In [None]:
gi, rv, path = PyExZ3.pyloader.exploreFunction(check_triangle)

In [None]:
Source(path.toDot())

## Lessons Learned

* One can use symbolic execution to augment the inputs that explore all characteristics of a program.

## Next Steps

_Link to subsequent chapters (notebooks) here:_

## Background

\cite{KLEE}

## Exercises

_Close the chapter with a few exercises such that people have things to do.  To make the solutions hidden (to be revealed by the user), have them start with_

```markdown
**Solution.**
```

_Your solution can then extend up to the next title (i.e., any markdown cell starting with `#`)._

_Running `make metadata` will automatically add metadata to the cells such that the cells will be hidden by default, and can be uncovered by the user.  The button will be introduced above the solution._

### Exercise 1: _Title_

_Text of the exercise_

In [None]:
# Some code that is part of the exercise
pass

_Some more text for the exercise_

**Solution.** _Some text for the solution_

In [None]:
# Some code for the solution
2 + 2

_Some more text for the solution_

### Exercise 2: _Title_

_Text of the exercise_

**Solution.** _Solution for the exercise_