# Second Project

In this project, you will implement a tool that given a Python program and a Python parser, reduces this program to the minimum program that still produces the same parsing error.  
To do this, you tool should make use of the Delta Debugging approach by Zeller et. al. However, instead of just deleting parts of the input, your tool should insure the validity of the produced Python code e.g. it should still be able to run in a non-faulty interpreter.  
This is done by manipulating the AST of the Python code.  

The time frame for this project is **2 weeks** and the Deadline is **January 15th 23:59**.

## Intelligent Reductions

The changes should be made in a way that the code is still executable within a standart Python interpreter. Therefore, you are required to implement certain modifications that can be done to the code. In general these transformations can be one of the following:
* delete a node  
* substitute a node with the _pass_ node  
* substitute a node with all of its children  
* substitute a node with one of its children  

For instance:
* Replace a `BoolOp` node by `True`.
* Replace a `BoolOp` node by `False`.
* Replace a `BoolOp` node by its left operand.
* Replace a `BoolOp` node by its right operand.
* Replace an `If` node by its "then" body.
* Replace an `If` node by its condition.
* Replace an `If` node by its "else" body.
* Replace all instances of a variable by a constant.
* Replace expressions by a constant.

[official Python `ast` reference](http://docs.python.org/3/library/ast) for a list of nodes

### Must-have implementation
Implement a reducer which minifies a given program so that its parser still produces an error.
To this end, collect all possible transformations over the nodes of an AST tree and then apply one change at a time. These modifications should be repeated until no further updates can be made without triggering a parser exception. 

*Note: Implementing the given modifications should be sufficient to succesfully complete this project.*

### May-have implementation
The must-have implementation aims for correctness but is very inefficient. Though, it can be optimised, for instance, with help of the delta-debugging approach. Implement an AST delta-debugger which efficiently prunes the nodes. 

Hint: Check _Hierarchical Delta Debugging_ paper from the Background section of **Reducing Failure-Inducing Inputs** Chapter.

## Evaluation

We evaluate your project based on public as well as secret tests. In this section, we **five** different Python parsers as well as **five** Python programms, which should be minified. These parsers check for a specific property in the code and fail the execution if the property exists. The programs and parsers in this section make up the public test cases. If you pass all of those tests **without hardcoding the modifications** you are guaranteed to score at least 15 points in this project.

In [1]:
import inspect
import ast
import sys
import astor
import abc

In [2]:
class ParserException(Exception):
    pass

class Parser(ast.NodeVisitor, metaclass=abc.ABCMeta):

    def parse(self, file):
        tree = ast.parse(source=file)
        self.visit(tree)
    
    @abc.abstractmethod
    def original(): 
        pass
    @abc.abstractmethod
    def minimized(): 
        pass
    
    def get_source(self, source):
        first_line = source[0]
        indentation = len(first_line) - len(first_line.lstrip())
        return ''.join([line[indentation:] for line in source])

    def get_original(self):
        source = inspect.getsourcelines(self.original)[0][1:]
        return self.get_source(source)

    def get_minimized(self):
        source = inspect.getsourcelines(self.minimized)[0][2:]
        return self.get_source(source)


In [3]:
class Parser1(Parser):
    """
    Contains boolean operation
    """    
    def visit_BoolOp(self, node):
        raise ParserException
        
    @staticmethod
    def original():
        a = True
        b = not False
        c = 30
        for i in range(c):
            if i == 15:
                if a and b:
                    return 1
        return 0
    @staticmethod    
    def minimized():
        True and True

In [4]:
class Parser2(Parser):
    """
    Contains if statement
    """
    def visit_If(self, node):
        raise ParserException

    @staticmethod        
    def original():
        a = True
        b = not False
        c = 30
        for i in range(c):
            if i == 15:
                if a and b:
                    return 1
        return 0
    @staticmethod
    def minimized():
        if True:
            return    

In [5]:
class Parser3(Parser):
    """
    Contains special unicode character
    """
    def __init__(self) -> None:
        self.assignment = False
        self.steps = 0

    def check_unicode(self, string):
        return string == u'\u0426'

    def generic_visit(self, node):
        self.steps += 1
        ast.NodeVisitor.generic_visit(self, node)

    def visit_Assign(self, node):
        self.assignment = True
        self.steps = 0
        self.generic_visit(node)

    def visit_Str(self, node):
        if self.assignment and self.steps == 3:
            if self.check_unicode(node.s):
                raise ParserException
    @staticmethod
    def original():
        a = 1 
        b = a
        c = a - b
        if c < a:
            d = ''
            while a == b:
                d = u'\u0426'
                a += 1
            return d
        return ''
    @staticmethod
    def minimized():
        d = u'\u0426'

In [6]:
class Parser4(Parser):
    """
    Variable not defined
    """
    def __init__(self) -> None:
        self.assignment = False
        self.steps = 0
        self.variables = set()

    def generic_visit(self, node):
        self.steps += 1
        ast.NodeVisitor.generic_visit(self, node)

    def visit_Name(self, node):
        if self.assignment and self.steps == 1:
            self.variables.add(node.id)
            self.assignment = False
            self.generic_visit(node)
        elif node.id in self.variables:
            self.generic_visit(node)
        else:
            raise ParserException

    def visit_Assign(self, node):
        self.assignment = True
        self.steps = 0
        self.generic_visit(node)
    @staticmethod
    def original():
        a = 1 
        b = a
        c = a - b
        if c < a:
            while a == b:
                a += 1
            return d
        return ''
    @staticmethod
    def minimized():
        return d

In [7]:
class Parser5(Parser):
    """
    Should contain a list
    """

    def visit_List(self, node):
        raise ParserException
        
    @staticmethod        
    def original():
        a = 1
        b = 0
        while True:
            if a < b:
                return [1, 2, 3]
            else:
                return []
            
    @staticmethod
    def minimized():
        []

Let's look at the example:

In [8]:
class Parser0(Parser):
    """
    Contains boolean operation
    """    
    def visit_BoolOp(self, node):
        raise ParserException("Parsing error")
        
    @staticmethod
    def original():
        a = 1
        b = 2
        if a and b:
            return 0
        return 1
    
    @staticmethod    
    def minimized():
        True and True


In [9]:
sys.path.append("../debuggingbook/notebooks/")
sys.path.append("../debuggingbook/docs/beta/code/")
import bookutils

In [10]:
from DeltaDebugger import NodeCollector, DeltaDebugger, ExpectError, copy_and_reduce
from showast import show_ast

In [11]:
class Reducer():
    def __init__(self, parser):
        self.parser = parser
        self.minimized_tree = None
        self.minimized_code = None
    def minimize(self):
        """
        Reduces the program
        Overwrite this
        """
        self.minimized_code = parser.get_minimized()
        return self.minimized_code

Let's try out the delta debugger implemented in **Reducing Failure-Inducing Inputs** chapter.

In [12]:
class DeltaDebuggerReducer(Reducer):

    def minimize(self):
        def compile_and_test_ast(tree, keep_list, parser):
            new_tree = copy_and_reduce(tree, keep_list)
            code = astor.to_source(new_tree)
            if not code:
                raise SyntaxError("Empty code")
            try:
                code_object = compile(new_tree, '<string>', 'exec')
            except Exception:
                raise SyntaxError("Cannot compile")
            parser.parse(code)
        source = self.parser.get_original()
        fun_tree = ast.parse(source)
        fun_nodes = NodeCollector().collect(fun_tree)
        with DeltaDebugger() as dd:
            compile_and_test_ast(fun_tree, fun_nodes, self.parser)
        reduced_nodes = dd.min_args()['keep_list']
        reduced_fun_tree = copy_and_reduce(fun_tree, reduced_nodes)
        self.minimized_tree = reduced_fun_tree
        self.minimized_code = astor.to_source(reduced_fun_tree)
        return self.minimized_code

In [13]:
parser = Parser0()
ddr = DeltaDebuggerReducer(parser)

In [14]:
minimized = ddr.minimize()

In [15]:
print("Original code:\n", ddr.parser.get_original())
print("Minimized code:\n", minimized)

Original code:
 def original():
    a = 1
    b = 2
    if a and b:
        return 0
    return 1

Minimized code:
 def original():
    pass
    pass
    if a and b:
        return 0



## Implemetation

In [16]:
class MyReducer(Reducer):

    def minimize(self):
        # TODO: implement this!
        return self.parser.get_original()

## Tests

The following section introduces public test cases, which are used to assess your performance in the project. Passing each test, will be enough to complete the project sucessfully.


In [17]:
import unittest

In [18]:
THRESHOLD = 6

In [19]:
'''
This node counter is used to assess the amount of reductions perfomed by your reducer.
'''
class NodeCounter(ast.NodeVisitor):

    def __init__(self) -> None:
        self.num_nodes = 0

    def visit(self, node):
        self.num_nodes += 1
        self.generic_visit(node)

    def count(self, source):
        tree = ast.parse(source=source)
        self.visit(tree)
        return self.num_nodes

In [20]:
class PublicTests(unittest.TestCase):

    def __init__(self, parser, reducer):
        self.parser = parser()
        self.reducer = reducer(self.parser)
        
    def count_nodes(self, source):
        if source is None:
            return 10000
        return NodeCounter().count(source)

    def run_tests(self):
        print(f'Running tests for {self.parser}:')
        # TODO: run their minimizer/delta debugger with a timeout
        reduced = self.reducer.minimize()
        self.has_property(reduced)
        self.is_minimized(reduced)

    def has_property(self, reduced):
        try:
            self.assertRaises(ParserException, lambda: self.parser.parse(reduced))
            print(f'HAS PROPERTY: OK')    
        except Exception as e:
            print(f'HAS PROPERTY: FAIL {e}')
    
    def is_minimized(self, reduced):
        count_minimized = self.count_nodes(reduced)
        count_template = self.count_nodes(self.parser.get_minimized())
        try:
            assert(count_minimized <= count_template + THRESHOLD)
            print(f'IS MINIMIZED: OK')    
        except Exception as e:
            print(f'IS MINIMIZED: FAIL {e}')

### Tests for parser 1

In [21]:
test1 = PublicTests(Parser1, MyReducer)
test1.run_tests()

Running tests for <__main__.Parser1 object at 0x10fa07fd0>:
HAS PROPERTY: OK
IS MINIMIZED: FAIL 


### Tests for parser 2

In [22]:
test2 = PublicTests(Parser2, MyReducer)
test2.run_tests()

Running tests for <__main__.Parser2 object at 0x10f989490>:
HAS PROPERTY: OK
IS MINIMIZED: FAIL 


### Tests for parser 3

In [23]:
test3 = PublicTests(Parser3, MyReducer)
test3.run_tests()

Running tests for <__main__.Parser3 object at 0x10f928c10>:
HAS PROPERTY: OK
IS MINIMIZED: FAIL 


### Tests for parser 4

In [24]:
test4 = PublicTests(Parser4, MyReducer)
test4.run_tests()

Running tests for <__main__.Parser4 object at 0x10df2d9d0>:
HAS PROPERTY: OK
IS MINIMIZED: FAIL 


### Tests for parser 5

In [25]:
test5 = PublicTests(Parser5, MyReducer)
test5.run_tests()

Running tests for <__main__.Parser5 object at 0x10fa07ad0>:
HAS PROPERTY: OK
IS MINIMIZED: FAIL 


In [26]:
test1 = PublicTests(Parser0, DeltaDebuggerReducer)
test1.run_tests()

Running tests for <__main__.Parser0 object at 0x10f9ba490>:
HAS PROPERTY: OK
IS MINIMIZED: FAIL 
