# Fuzzing with Constraints

In this chapter, we show how to extend grammars with _constraints_ – conditions that are evaluated while a string is produced, and which have to be satisfied.

**Prerequisites**

* You should have read the [chapter on efficient grammar fuzzing](GrammarFuzzer.ipynb).

## Specifying Functions


## Generating Elements during Expansion

Define symbols right out of a Python function

In [None]:
import random

In [None]:
import fuzzingbook_utils

In [None]:
from Grammars import EXPR_GRAMMAR, is_valid_grammar, is_nonterminal, opts, exp_opt, exp_string
from GrammarFuzzer import GrammarFuzzer, all_terminals
import copy

In [None]:
constrained_expr_grammar = copy.deepcopy(EXPR_GRAMMAR)

constrained_expr_grammar.update(
    {
     "<start>": ["<expr>"],
     "<factor>": [
         "+<factor>",
         "-<factor>",
         "(<expr>)",
         ("<integer>.<integer>", opts(pre=lambda: (random.randint(100, 200), None))),
         ("<integer>", opts(pre=lambda: random.randint(100, 200))),
        ],
    }
)

In [None]:
def exp_pre_expansion_function(expansion):
    """Return the specified pre-expansion function, or None if unspecified"""
    return exp_opt(expansion, 'pre')

In [None]:
class ConstraintGrammarFuzzer(GrammarFuzzer):
    def supported_opts(self):
        return super().supported_opts() | {"pre", "post", "order"}

In [None]:
class ConstraintGrammarFuzzer(ConstraintGrammarFuzzer):
    def expansion_to_children(self, expansion):
        children = super().expansion_to_children(expansion)
        function = exp_pre_expansion_function(expansion)
        if function is None:
            return children
        
        assert callable(function)
        result = function()

        if self.log:
            print(repr(function) + "()", "=", repr(result))
        return self.apply_result(result, children)

The specified `function` can return one of several types:

* _Boolean_ values and `None` values are ignored.
* A _string_ $s$ replaces the entire expansion with $s$.
* A _tuple_ $(x_1, x_2, \dots, x_n)$ replaces the $i$-th symbol with $x_i$ for every $x_i$ that is not `None`.  If $x_i$ is not a string, it is converted to a string.
* All _other types_ are converted to strings, replacing the entire expansion.

In [None]:
class ConstraintGrammarFuzzer(ConstraintGrammarFuzzer):
    def apply_result(self, result, children):
        if isinstance(result, bool) or result is None:
            pass
        elif isinstance(result, str):
            children = [(result, [])]
        elif isinstance(result, tuple):
            symbol_indexes = [i for i, c in enumerate(children) if is_nonterminal(c[0])]

            for index, value in enumerate(result):
                if value is not None:
                    child_index = symbol_indexes[index]
                    if not isinstance(value, str):
                        value = repr(value)
                    if self.log:
                        print("Replacing", all_terminals(children[child_index]), "by", value)

                    # children[child_index] = (value, [])
                    (child_symbol, _) = children[child_index]
                    children[child_index] = (child_symbol, [(value, [])])
        else:
            if self.log:
                print("Replacing", "".join([all_terminals(c) for c in children]), "by", result)

            children = [(repr(result), [])]

        return children

In [None]:
evaluating_fuzzer = ConstraintGrammarFuzzer(constrained_expr_grammar)
evaluating_fuzzer.fuzz()

## Checking Elements after Expansion

In [None]:
constrained_expr_grammar = copy.deepcopy(EXPR_GRAMMAR)

def eval_with_exception(s):
    with ExpectError():
        return eval(s)
    return False

constrained_expr_grammar.update(
    {
        "<start>": [("<expr>", opts(post=lambda s: eval_with_exception(s) > 10))]
    }
)

assert is_valid_grammar(constrained_expr_grammar)

In [None]:
def exp_post_expansion_function(expansion):
    """Return the specified post-expansion function, or None if unspecified"""
    return exp_opt(expansion, 'post')

In [None]:
class ConstraintGrammarFuzzer(ConstraintGrammarFuzzer):
    def eval_function(self, tree, function):
        symbol, children = tree
        # print("Does", all_terminals(tree), "satisfy", repr(function) + "?")

        assert callable(function)

        args = []
        for (symbol, exp) in children:
            if exp != [] and exp is not None:
                symbol_value = all_terminals((symbol, exp))
                args.append(symbol_value)
                
        result = function(*args)
        if self.log:
            print(repr(function) + repr(tuple(args)), "=", repr(result))

        return result

If the `function` returns the Boolean value `False`, the expansion is deemed invalid; and the fuzzer attempts to create another expansion.

In [None]:
class ConstraintGrammarFuzzer(ConstraintGrammarFuzzer):
    def find_expansion(self, tree):
        symbol, children = tree

        applied_expansion = \
            "".join([child_symbol for child_symbol, _ in children])

        for expansion in self.grammar[symbol]:
            if exp_string(expansion) == applied_expansion:
                return expansion
            
        raise KeyError(symbol + ": did not find expansion" + applied_expansion)
    
    # Return True iff all constraints of grammar are satisfied in TREE
    def run_post_functions(self, tree, depth=float("inf")):
        symbol, children = tree
        expansion = self.find_expansion(tree)

        function = exp_post_expansion_function(expansion)
        if function is None:
            return None

        result = self.eval_function(tree, function)
        if isinstance(result, bool) and not result:
            if self.log:
                print(all_terminals(tree), "did not satisfy", symbol, "constraint")
            return False

        children = self.apply_result(result, children)

        if depth > 0:
            for c in children:
                result = self.run_post_functions(c, depth - 1)
                if isinstance(result, bool) and not result:
                    return False

        return result

The simplest method to check constraints is to retain only those trees that satisfy them.  Works, but can be very slow.

In [None]:
class ConstraintGrammarFuzzer(ConstraintGrammarFuzzer):
    def fuzz_tree(self):
        while True:
            tree = super().fuzz_tree()
            result = self.run_post_functions(tree)
            if not isinstance(result, bool) or result:
                return tree

In [None]:
from ExpectError import ExpectError, ExpectTimeout

In [None]:
constraint_grammar_fuzzer = ConstraintGrammarFuzzer(constrained_expr_grammar)
with ExpectTimeout(1):
    expr = constraint_grammar_fuzzer.fuzz()
expr

In [None]:
with ExpectError():
    eval(expr)

In [None]:
constrained_expr_grammar.update(
    {
     "<start>": ["<expr>"],
     "<factor>": [
         "+<factor>",
         "-<factor>",
         "(<expr>)",
         ("<integer>.<integer>", opts(post=lambda s1, s2: float(s1 + "." + s2) > 10)),
         ("<integer>", opts(post=lambda n: int(n) > 10))
        ],
    }
)

In [None]:
from Timer import Timer

In [None]:
constraint_grammar_fuzzer = ConstraintGrammarFuzzer(constrained_expr_grammar)
with Timer() as timer:
    print([constraint_grammar_fuzzer.fuzz() for i in range(10)])

In [None]:
timer.elapsed_time()

## Checking Elements Sooner

Make things faster: Check as soon as (sub)tree is complete

In [None]:
class ConstraintGrammarFuzzer(ConstraintGrammarFuzzer):
    def expand_tree_once(self, tree):
        new_tree = super().expand_tree_once(tree)
        
        (symbol, children) = new_tree
        if all([exp_post_expansion_function(expansion) is None for expansion in self.grammar[symbol]]):
            # No constraints for this symbol
            return new_tree
                
        if self.any_possible_expansions(tree):
            # Still expanding
            return new_tree

        result = self.run_post_functions(new_tree, depth=0)
        if not isinstance(result, bool) or result:
            # No constraints, or constraint satisfied
            children = self.apply_result(result, children)
            new_tree = (symbol, children)
            return new_tree

        # Replace tree by unexpanded symbol and try again
        if self.log:
            print(all_terminals(new_tree), "did not satisfy", symbol, "constraint")
            
        if self.replacement_attempts_counter > 0:
            if self.log:
                print("Trying another expansion")
            self.replacement_attempts_counter -= 1
            return (symbol, None)
        
        if self.log:
            print("Starting from scratch")
        raise RestartExpansionException

In [None]:
class RestartExpansionException(Exception):
    pass

In [None]:
class ConstraintGrammarFuzzer(ConstraintGrammarFuzzer):
    def __init__(self, grammar, replacement_attempts=10, **kwargs):
        super().__init__(grammar, **kwargs)
        self.replacement_attempts = replacement_attempts

    def fuzz_tree(self):
        while True:
            self.replacement_attempts_counter = self.replacement_attempts
            try:
                tree = super().fuzz_tree()
                return tree
            except RestartExpansionException:
                continue

In [None]:
constraint_grammar_fuzzer = ConstraintGrammarFuzzer(constrained_expr_grammar)
with Timer() as timer:
    print([constraint_grammar_fuzzer.fuzz() for i in range(10)])

In [None]:
timer.elapsed_time()

## Ordering Expansions

In [None]:
from IPython.core.display import HTML, display

In [None]:
from GrammarFuzzer import display_tree

In [None]:
def exp_order(expansion):
    """Return the specified expansion ordering, or None if unspecified"""
    return exp_opt(expansion, 'order')

In [None]:
class ConstraintGrammarFuzzer(ConstraintGrammarFuzzer):
    def choose_tree_expansion(self, tree, expandable_children):
        """Return index of subtree in `children` to be selected for expansion.  Defaults to random."""
        (symbol, tree_children) = tree
        if len(expandable_children) == 1:
            # No choice
            return super().choose_tree_expansion(tree, expandable_children)

        expansion = self.find_expansion(tree)
        given_order = exp_order(expansion)
        if given_order is None:
            # No order specified
            return super().choose_tree_expansion(tree, expandable_children)

        nonterminal_children = [c for c in tree_children if c[1] != []]
        assert len(nonterminal_children) == len(given_order), "Order must have one element for each nonterminal"

        # print("Checking ", expandable_children, "against", nonterminal_children, repr(given_order))

        # Find expandable child with lowest ordering
        min_given_order = None
        j = 0
        for k, expandable_child in enumerate(expandable_children):
            while j < len(nonterminal_children) and expandable_child != nonterminal_children[j]:
                j += 1
            assert j < len(nonterminal_children), "Expandable child not found"
            if self.log:
                print("Expandable child #%d %s has order %d" % (k, expandable_child[0], given_order[j]))

            if min_given_order is None or given_order[j] < min_given_order:
                min_given_order = k

        assert min_given_order is not None
        
        if self.log:
            print("Returning expandable child #%d %s" % 
                  (min_given_order, expandable_children[min_given_order][0]))

        return min_given_order

## Usage Examples

### Matching Tags

In [None]:
from Grammars import crange

In [None]:
XML_GRAMMAR = {
    "<start>": ["<xml-tree>"],
    "<xml-tree>": ["<<id>><xml-content></<id>>"],
    "<xml-content>": ["Text", "<xml-tree>"],
    "<id>": ["<letter>", "<id><letter>"],
    "<letter>": crange('a', 'z')
}

assert is_valid_grammar(XML_GRAMMAR)

In [None]:
xml_fuzzer = GrammarFuzzer(XML_GRAMMAR)
xml_fuzzer.fuzz()

In [None]:
XML_GRAMMAR.update({
    "<xml-tree>": [("<<id>><xml-content></<id>>",
                    opts(post=lambda id1, content, id2: (None, None, id1))
                   )]
})

In [None]:
xml_fuzzer = ConstraintGrammarFuzzer(XML_GRAMMAR)
xml_fuzzer.fuzz()

### Checksums

In [None]:
def luhn_checksum(s):
    LUHN_ODD_LOOKUP = (0, 2, 4, 6, 8, 1, 3, 5, 7, 9)  # sum_of_digits (index * 2)
    
    evens = sum(int(p) for p in s[-1::-2])
    odds = sum(LUHN_ODD_LOOKUP[int(p)] for p in s[-2::-2])
    return (evens + odds) % 10

def valid_luhn_checksum(s):
    return luhn_checksum(s[:-1]) == int(s[-1])

def fix_luhn_checksum(s):
    return s[:-1] + repr(luhn_checksum(s[:-1]))

In [None]:
luhn_checksum("123")

In [None]:
fix_luhn_checksum("123x")

In [None]:
CREDIT_CARD_GRAMMAR = {
    "<start>": ["<credit-card-number>"],
    "<credit-card-number>": [("<digits>", opts(post=fix_luhn_checksum))],
    "<digits>": ["<digit-block><digit-block><digit-block><digit-block>"],
    "<digit-block>": ["<digit><digit><digit><digit>"],
    "<digit>": crange('0', '9')
}

assert is_valid_grammar(CREDIT_CARD_GRAMMAR)

In [None]:
g = GrammarFuzzer(CREDIT_CARD_GRAMMAR)
cc_number = g.fuzz()
cc_number

In [None]:
valid_luhn_checksum(cc_number)

In [None]:
fixed_cc_number = fix_luhn_checksum(cc_number)
fixed_cc_number

In [None]:
valid_luhn_checksum(fixed_cc_number)

In [None]:
fixing_fuzzer = ConstraintGrammarFuzzer(CREDIT_CARD_GRAMMAR)
cc_number = fixing_fuzzer.fuzz()
cc_number

In [None]:
valid_luhn_checksum(cc_number)

### Defining and Using Identifiers

In [None]:
from Parser import VAR_GRAMMAR

In [None]:
SYMBOL_TABLE = set()

In [None]:
def define_id(id):
    SYMBOL_TABLE.add(id)

In [None]:
def use_id():
    if len(SYMBOL_TABLE) == 0:
        return False

    id = random.choice(list(SYMBOL_TABLE))
    return id

In [None]:
def clear_symbol_table():
    global SYMBOL_TABLE
    SYMBOL_TABLE = set()

In [None]:
CONSTRAINED_VAR_GRAMMAR = copy.deepcopy(VAR_GRAMMAR)
CONSTRAINED_VAR_GRAMMAR.update({
    "<start>": [("<statements>", opts(pre=lambda: clear_symbol_table()))],
    "<assignment>": [("<identifier>=<expr>", opts(post=lambda id, expr: define_id(id),
                                                  order=(1, 2)))],
    "<factor>": ['+<factor>', '-<factor>', '(<expr>)',
                 ("<identifier>", opts(post=lambda _: use_id())),
                 '<number>'],
    "<statements>": [("<statement>;<statements>", opts(order=(1, 2))),
                      "<statement>"]
})

assert is_valid_grammar(CONSTRAINED_VAR_GRAMMAR)

In [None]:
from ExpectError import ExpectTimeout

In [None]:
g = ConstraintGrammarFuzzer(CONSTRAINED_VAR_GRAMMAR)
for i in range(10):
    print(g.fuzz())

## All Together

In [None]:
from ProbabilisticGrammarFuzzer import ProbabilisticGrammarFuzzer, ProbabilisticGrammarCoverageFuzzer

In [None]:
class BigFatGrammarFuzzer(ProbabilisticGrammarFuzzer, ConstraintGrammarFuzzer):
    pass

In [None]:
class BigFatGrammarCoverageFuzzer(ProbabilisticGrammarCoverageFuzzer, ConstraintGrammarFuzzer):
    pass

## Lessons Learned

* _Lesson one_
* _Lesson two_
* _Lesson three_

## Next Steps

_Link to subsequent chapters (notebooks) here, as in:_

* [use _mutations_ on existing inputs to get more valid inputs](MutationFuzzer.ipynb)
* [use _grammars_ (i.e., a specification of the input format) to get even more valid inputs](Grammars.ipynb)
* [reduce _failing inputs_ for efficient debugging](Reducer.ipynb)


## Background

_Cite relevant works in the literature and put them into context, as in:_

The idea of ensuring that each expansion in the grammar is used at least once goes back to Burkhardt \cite{Burkhardt1967}, to be later rediscovered by Paul Purdom \cite{Purdom1972}.

## Exercises

1. Implement a syntax that allows people to refer to subtrees – say $1.$2 is the second child of the first symbol.


### Exercise 1: _Title_

_Text of the exercise_

In [None]:
# Some code that is part of the exercise
pass

_Some more text for the exercise_

**Solution.** _Some text for the solution_

In [None]:
# Some code for the solution
2 + 2

_Some more text for the solution_

In [None]:
ATTR_GRAMMAR = {
 "<clause>": ("<xml-open>Text<xml-close>", opts(constraint=lambda x1, x2: x1.name == x2.name)),
 "<xml-open>": ("<langle><tag><rangle>", opts(producer=lambda: (None, opts(name=random_name()), None))),
 "<xml-close>": ("<langle>/<tag><rangle>", opts(producer=lambda: (None, find_name(), None))),
}

### Exercise 2: _Title_

_Text of the exercise_

**Solution.** _Solution for the exercise_