# Probabilistic Grammar Fuzzing

Let us give grammars even more power by assigning probabilities to individual expansions.  This allows us to control how many of each element should be produced.  \todo{Work in progress.}

**Prerequisites**

* You should have read the [chapter on grammars](Grammars.ipynb).

In [None]:
import fuzzingbook_utils

In [None]:
from GrammarFuzzer import GrammarFuzzer, all_terminals

In [None]:
from Grammars import is_valid_grammar, EXPR_GRAMMAR, START_SYMBOL, crange

We introduce a little helper function that will allow us to add arbitrary options to an expansion.

In [None]:
def opts(**kwargs):
    return kwargs

In [None]:
PROBABILISTIC_EXPR_GRAMMAR = {
    "<start>":
        ["<expr>"],

    "<expr>":
        [("<term> + <expr>", opts(prob=0.1)),
         ("<term> - <expr>", opts(prob=0.2)),
         "<term>"],

    "<term>":
        [("<factor> * <term>", opts(prob=0.1)),
         ("<factor> / <term>", opts(prob=0.1)),
         "<factor>"
         ],

    "<factor>":
        ["+<factor>", "-<factor>", "(<expr>)",
            "<leadinteger>", "<leadinteger>.<integer>"],

    "<leadinteger>":
        ["<leaddigit><integer>", "<leaddigit>"],

    # Benford's law: frequency distribution of leading digits
    "<leaddigit>":
        [("1", opts(prob=0.301)),
         ("2", opts(prob=0.176)),
         ("3", opts(prob=0.125)),
         ("4", opts(prob=0.097)),
         ("5", opts(prob=0.079)),
         ("6", opts(prob=0.067)),
         ("7", opts(prob=0.058)),
         ("8", opts(prob=0.051)),
         ("9", opts(prob=0.046)),
         ],

    # Remaining digits are equally distributed
    "<integer>":
        ["<digit><integer>", "<digit>"],

    "<digit>":
        ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"],
}

In [None]:
assert is_valid_grammar(PROBABILISTIC_EXPR_GRAMMAR)

In [None]:
PROBABILISTIC_EXPR_GRAMMAR["<leaddigit>"]

In [None]:
def exp_string(expansion):
    """Return the string to be expanded"""
    if isinstance(expansion, str):
        return expansion
    return expansion[0]

In [None]:
exp_string(PROBABILISTIC_EXPR_GRAMMAR["<leaddigit>"][0])

In [None]:
def exp_prob(expansion):
    """Return the specified probability, or None if unspecified"""
    if isinstance(expansion, str):
        return None
    return expansion[1]['prob']

In [None]:
exp_prob(PROBABILISTIC_EXPR_GRAMMAR["<leaddigit>"][0])

The probabilistic grammar works with our existing infrastructure:

In [None]:
f = GrammarFuzzer(PROBABILISTIC_EXPR_GRAMMAR)
f.fuzz()

In [None]:
from GrammarCoverageFuzzer import GrammarCoverageFuzzer

In [None]:
f = GrammarCoverageFuzzer(PROBABILISTIC_EXPR_GRAMMAR)
f.fuzz()

## Checking Probabilities

In [None]:
def exp_probabilities(expansions, nonterminal="<symbol>"):
    probabilities = [exp_prob(expansion) for expansion in expansions]
    prob_dist = prob_distribution(probabilities, nonterminal)
    
    prob_mapping = {}
    for i in range(len(expansions)):
        expansion = exp_string(expansions[i])
        prob_mapping[expansion] = prob_dist[i]
    
    return prob_mapping

In [None]:
def prob_distribution(probabilities, nonterminal="<symbol>"):
    epsilon = 0.00001

    number_of_unspecified_probabilities = probabilities.count(None)
    if number_of_unspecified_probabilities == 0:
        assert abs(sum(probabilities) - 1.0) < epsilon, \
            nonterminal + ": sum of probabilities must be 1.0"
        return probabilities

    sum_of_specified_probabilities = 0.0
    for p in probabilities:
        if p is not None:
            sum_of_specified_probabilities += p
    assert 0 <= sum_of_specified_probabilities <= 1.0, \
        nonterminal + ": sum of specified probabilities must be between 0.0 and 1.0"

    default_probability = ((1.0 - sum_of_specified_probabilities) / 
         number_of_unspecified_probabilities)
    all_probabilities = []
    for p in probabilities:
        if p is None:
            p = default_probability
        all_probabilities.append(p)

    assert abs(sum(all_probabilities) - 1.0) < epsilon
    return all_probabilities

In [None]:
PROBABILISTIC_EXPR_GRAMMAR["<leaddigit>"]

In [None]:
exp_probabilities(PROBABILISTIC_EXPR_GRAMMAR["<leaddigit>"])

In [None]:
exp_probabilities(PROBABILISTIC_EXPR_GRAMMAR["<digit>"])

In [None]:
exp_probabilities(PROBABILISTIC_EXPR_GRAMMAR["<expr>"])

In [None]:
def is_valid_probabilistic_grammar(grammar, start_symbol=START_SYMBOL):
    if not is_valid_grammar(grammar, start_symbol):
        return False
   
    for nonterminal in grammar:
        expansions = grammar[nonterminal]
        prob_dist = exp_probabilities(expansions, nonterminal)
    
    return True

In [None]:
assert is_valid_probabilistic_grammar(PROBABILISTIC_EXPR_GRAMMAR)

In [None]:
assert is_valid_probabilistic_grammar(EXPR_GRAMMAR)

In [None]:
from ExpectError import ExpectError

In [None]:
with ExpectError():
    assert not is_valid_probabilistic_grammar({"<start>": [("1", opts(prob=0.5))]})

In [None]:
with ExpectError():
    assert not is_valid_probabilistic_grammar({"<start>": [("1", opts(prob=1.5)), "2"]})

## Selecting by Probability

In [None]:
import random

In [None]:
class ProbabilisticGrammarFuzzer(GrammarFuzzer):
    def choose_node_expansion(self, node, possible_children):
        (symbol, tree) = node
        expansions = self.grammar[symbol]
        probabilities = exp_probabilities(expansions)

        weights = []
        for child in possible_children:
            child_weight = probabilities[all_terminals((node, child))]
            weights.append(child_weight)
            
        return random.choices(range(len(possible_children)), weights=weights)[0]

In [None]:
f = ProbabilisticGrammarFuzzer(PROBABILISTIC_EXPR_GRAMMAR)
f.fuzz()

In [None]:
leaddigit_fuzzer = ProbabilisticGrammarFuzzer(PROBABILISTIC_EXPR_GRAMMAR, start_symbol="<leaddigit>")
leaddigit_fuzzer.fuzz()

In [None]:
trials = 10000

count = {}
for c in crange('0', '9'):
    count[c] = 0

for i in range(trials):
    count[leaddigit_fuzzer.fuzz()] += 1

print([(digit, count[digit] / trials) for digit in count])

## Lessons Learned

* _Lesson one_
* _Lesson two_
* _Lesson three_

## Next Steps

_Link to subsequent chapters (notebooks) here, as in:_

* [use _mutations_ on existing inputs to get more valid inputs](MutationFuzzer.ipynb)
* [use _grammars_ (i.e., a specification of the input format) to get even more valid inputs](Grammars.ipynb)
* [reduce _failing inputs_ for efficient debugging](Reducer.ipynb)


## Exercises

Close the chapter with a few exercises such that people have things to do.  In Jupyter Notebook, use the `exercise2` nbextension to add solutions that can be interactively viewed or hidden:

* Mark the _last_ cell of the exercise (this should be a _text_ cell) as well as _all_ cells of the solution.  (Use the `rubberband` nbextension and use Shift+Drag to mark multiple cells.)
* Click on the `solution` button at the top.

(Alternatively, just copy the exercise and solution cells below with their metadata.)

### Exercise 1

Create a class `ProbabilisticGrammarCoverageFuzzer` that extends `GrammarCoverageFuzzer` with probabilistic capabilities.  The idea is to first cover all uncovered expansions (like `GrammarCoverageFuzzer`) and once all expansions are covered, to proceed by probabilities (like `ProbabilisticGrammarFuzzer`).  To this end, define new instances of the `choose_covered_node_expansion()` and `choose_uncovered_node_expansion()` methods that choose an expansion based on the given weights.  If you are an advanced programmer, realize the class via _multiple inheritance_ from `GrammarCoverageFuzzer` and `ProbabilisticGrammarFuzzer` to achieve this.

**Solution**.  With multiple inheritance, this is fairly easy; we just need to point the three methods to the right places:

In [None]:
class ProbabilisticGrammarCoverageFuzzer(GrammarCoverageFuzzer, ProbabilisticGrammarFuzzer):
    # Choose uncovered expansions first
    def choose_node_expansion(self, node, possible_children):
        return GrammarCoverageFuzzer.choose_node_expansion(self, node, possible_children)

    # Among uncovered expansions, pick by (relative) probability
    def choose_uncovered_node_expansion(self, node, possible_children):
        return ProbabilisticGrammarFuzzer.choose_node_expansion(self, node, possible_children)
    
    # For covered nodes, pick by probability, too
    def choose_covered_node_expansion(self, node, possible_children):
        return ProbabilisticGrammarFuzzer.choose_node_expansion(self, node, possible_children)

In the first nine invocations, our fuzzer covers one digit after another:

In [None]:
cov_leaddigit_fuzzer = ProbabilisticGrammarCoverageFuzzer(PROBABILISTIC_EXPR_GRAMMAR, start_symbol="<leaddigit>")
print([cov_leaddigit_fuzzer.fuzz() for i in range(9)])

After these, we again proceed by probabilities:

In [None]:
trials = 10000

count = {}
for c in crange('0', '9'):
    count[c] = 0

for i in range(trials):
    count[cov_leaddigit_fuzzer.fuzz()] += 1

print([(digit, count[digit] / trials) for digit in count])

### Exercise 2

_Text of the exercise_

_Solution for the exercise_