# Mutations with Grammars

In this notebook, we make a very short and simple introduction on how to use the `fuzzingbook` framework for grammar-based mutation – both for data and for code.

**Prerequisites**

* This chapter is meant to be self-contained.

## Defining Grammars

We define a grammar using standard Python data structures.  Suppose we want to encode this grammar:

```
<start>   ::= <expr>
<expr>    ::= <term> + <expr> | <term> - <expr> | <term>
<term>    ::= <term> * <factor> | <term> / <factor> | <factor>
<factor>  ::= +<factor> | -<factor> | (<expr>) | <integer> | <integer>.<integer>
<integer> ::= <digit><integer> | <digit>
<digit>   ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
```


In [None]:
import bookutils

In [None]:
from Grammars import syntax_diagram, is_valid_grammar, convert_ebnf_grammar, srange, crange

In Python, we encode this as a mapping (a dictionary) from nonterminal symbols to a list of possible expansions:

In [None]:
EXPR_GRAMMAR = {
    "<start>":
        ["<expr>"],

    "<expr>":
        ["<term> + <expr>", "<term> - <expr>", "<term>"],

    "<term>":
        ["<factor> * <term>", "<factor> / <term>", "<factor>"],

    "<factor>":
        ["+<factor>",
         "-<factor>",
         "(<expr>)",
         "<integer>.<integer>",
         "<integer>"],

    "<integer>":
        ["<digit><integer>", "<digit>"],

    "<digit>":
        ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
}

In [None]:
assert is_valid_grammar(EXPR_GRAMMAR)

In [None]:
syntax_diagram(EXPR_GRAMMAR)

## Fuzzing with Grammars

We mostly use grammars for _fuzzing_, as in here:

In [None]:
from GrammarFuzzer import GrammarFuzzer

In [None]:
expr_fuzzer = GrammarFuzzer(EXPR_GRAMMAR)
for i in range(10):
    print(expr_fuzzer.fuzz())

## Parsing with Grammars

We can parse a given input using a grammar:

In [None]:
expr_input = "2 + -2"

In [None]:
from Parser import EarleyParser, display_tree, tree_to_string

In [None]:
expr_parser = EarleyParser(EXPR_GRAMMAR)

In [None]:
expr_tree = list(expr_parser.parse(expr_input))[0]

In [None]:
display_tree(expr_tree)

Internally, each subtree is a pair of a node and a list of children (subtrees)

In [None]:
expr_tree

## Mutating a Tree

We define a simple mutator that traverses an AST to mutate it.

In [None]:
def swap_plus_minus(tree):
    node, children = tree
    if node == " + ":
        node = " - "
    elif node == " - ":
        node = " + "
    return node, children

In [None]:
def apply_mutator(tree, mutator):
    node, children = mutator(tree)  
    return node, [apply_mutator(c, mutator) for c in children]

In [None]:
mutated_tree = apply_mutator(expr_tree, swap_plus_minus)

In [None]:
display_tree(mutated_tree)

## Unparsing the Mutated Tree

To unparse, we traverse the tree and look at all terminal symbols:

In [None]:
tree_to_string(mutated_tree)

## Lots of mutations

In [None]:
for i in range(10):
    s = expr_fuzzer.fuzz()
    s_tree = list(expr_parser.parse(s))[0]
    s_mutated_tree = apply_mutator(s_tree, swap_plus_minus)
    s_mutated = tree_to_string(s_mutated_tree)
    print('    ' + s + '\n->  ' + s_mutated + '\n')

## Another Example: JSON


In [None]:
import string

In [None]:
CHARACTERS_WITHOUT_QUOTE = (string.digits
                            + string.ascii_letters
                            + string.punctuation.replace('"', '').replace('\\', '')
                            + ' ')

In [None]:
JSON_EBNF_GRAMMAR = {
    "<start>": ["<json>"],
    "<json>": ["<element>"],
    "<element>": ["<ws><value><ws>"],
    "<value>": ["<object>", "<array>", "<string>", "<number>", "true", "false", "null"],
    "<object>": ["{<ws>}", "{<members>}"],
    "<members>": ["<member>(,<members>)*"],
    "<member>": ["<ws><string><ws>:<element>"],
    "<array>": ["[<ws>]", "[<elements>]"],
    "<elements>": ["<element>(,<elements>)*"],
    "<element>": ["<ws><value><ws>"],
    "<string>": ['"' + "<characters>" + '"'],
    "<characters>": ["<character>*"],
    "<character>": srange(CHARACTERS_WITHOUT_QUOTE),
    "<number>": ["<int><frac><exp>"],
    "<int>": ["<digit>", "<onenine><digits>", "-<digits>", "-<onenine><digits>"],
    "<digits>": ["<digit>+"],
    "<digit>": ['0', "<onenine>"],
    "<onenine>": crange('1', '9'),
    "<frac>": ["", ".<digits>"],
    "<exp>": ["", "E<sign><digits>", "e<sign><digits>"],
    "<sign>": ["", '+', '-'],
    "<ws>": ["( )*"]
}

assert is_valid_grammar(JSON_EBNF_GRAMMAR)

In [None]:
JSON_GRAMMAR = convert_ebnf_grammar(JSON_EBNF_GRAMMAR)

In [None]:
syntax_diagram(JSON_GRAMMAR)

In [None]:
json_input = '{"conference": "ICSE"}'

In [None]:
json_parser = EarleyParser(JSON_GRAMMAR)

In [None]:
json_tree = list(json_parser.parse(json_input))[0]

In [None]:
display_tree(json_tree)

In [None]:
def swap_venue(tree):
    if tree_to_string(tree) == '"ICSE"':
        tree = list(json_parser.parse('"ICST"'))[0]
    return tree

In [None]:
mutated_tree = apply_mutator(json_tree, swap_venue)

In [None]:
tree_to_string(mutated_tree)