Merge branch 'master' of https://github.com/johngunderman/fuzzbuzz

timtadh · Apr 27, 2012 · 8ad13e0 · 8ad13e0
2 parents 6f0c34e + efa7248
commit 8ad13e0
Show file tree

Hide file tree

Showing 4 changed files with 74 additions and 14 deletions.
diff --git a/papers/eecs444_report/attrgram.tex b/papers/eecs444_report/attrgram.tex
@@ -116,7 +116,8 @@ \subsection{Generating Strings from Attribute Grammars}
 statements and the actions are restricted to assignments, if-then, and
 if-then-else statements the problem is NP-Hard. Appendix \ref{hard} contains a
 proof that Attribute Grammar String Generation (AGSG) belongs to the class
-NP-Complete.
+NP-Complete. To the author's knowledge this is the first formal attempt to
+classify this particular problem.
 
 Our program ``Fuzzbuzz'' implements string generation from attribute grammars
 and it does so in a top down manner. It starts at the start symbol and
@@ -130,3 +131,59 @@ \subsection{Generating Strings from Attribute Grammars}
 algorithm further details are left for the adventurous reader in the source of
 the program.\footnote{github.com/timtadh/fuzzbuzz}
 
+\subsection{Related Work}
+
+Modern work on generation of strings from context free grammars for use in
+automated test case generation can be traced back to the algorithm due to
+Purdom.\cite{Purdom1972} Purdom's algorithm was designed specifically to test
+LR(1) parsers and as such was unconcerned with the semantics of a language. It
+particular Purdom hoped to catch bugs related to the automate parser generator
+construction. Our concerns our more broad, we hope to identify faults throughout
+a software system, not just faults in the parser. Therefore, we cannot ignore
+the semantics of a language.
+
+Ten years later Duncan and Hutchinson became interested in using the attribute
+grammar formalism for use in creating testable specifications.\cite{Duncan1981}
+Duncan and Hutchinson's goal was to create a formal way of specifying the input
+and output languages of a program. They then used these specifications to test
+the program for conformance to the predefined specifications. They created two
+test case generation algorithms. In one version they choose which grammar rule
+to apply ``random'' or based on user supplied heuristics. In another version all
+choices are predetermined to enable systematic enumeration of test cases from
+the grammar.
+
+The difficulty in Duncan and Hutchinson's algorithm (as with all attribute
+grammar string generation algorithms) comes with processing the actions and
+conditions. In their terminology actions are actions but conditions are termed
+``guards'' (as in they guard the application of a grammar rule). Duncan and
+Hutchinson attribute their use of guards to Milton whose 1979 paper discusses
+parsing LL(k) attributed grammars.\cite{Milton1979} Milton notes attribute
+grammars can generate a type-0 language. However, the attribute grammars used in
+this paper cannot. They are necessarily restricted such that the worst case
+generation time is exponential rather than undecidable! 
+
+Unfortunately, Duncan and Hutchinson's paper is light on details on their actual
+generation algorithm. Their prose description highlights many of the problems
+future authors might encounter but does not give precise details on their
+solutions. Since the authors worked at the GE Research \& Development Center it
+could be they considered their algorithm proprietary. However, this situation is
+sadly common place in the automated test case generation literature. Many papers
+have fail to adequately explain their generative algorithms. 
+
+Another ten years passed an test case generation using attribute grammars once
+again re-emerged. Maurer constructed a black box fuzzing system which includes
+actions and conditions as a feature in 1990.\cite{Maurer1990} However, while
+Maurer rhapsodizes on the usefulness and necessity of the feature he fails to
+describe his generative algorithm in any detail. Additionally, like Milton and
+Duncan, Maurer does not formally consider the computational complexity of the
+generation process.
+
+Finally, modern blackbox fuzzing systems like
+PeachFuzz\footnote{http://peachfuzzer.com/} utilize primitive forms of
+contextual constraints. They don't tend to utilize the attribute grammar
+formalism but their constraints can easily be represented as an attribute
+grammar. PeachFuzz and other system usually require the laborious construction
+of grammars (typically specified in custom XML). In the future we hope to
+alleviate this problem with automated inference of actions and conditions from a
+corpus of operational examples.
+
diff --git a/papers/eecs444_report/cfgstats.tex b/papers/eecs444_report/cfgstats.tex
@@ -126,8 +126,10 @@ \subsection{Literature Review}
 
 \subsection{Results}
 
-It should be noted that the CFGStats engine does not produce semantically
-correct output. For example, a Fuzzbuzz run produced the following:
+
+It should be noted that the CFGStats engine does not produce
+semantically correct output. For example, a Fuzzbuzz run using a
+grammar which described calculator input produced the following:
 
 \begin{verbatim}
 2.74432441811 - < 0.424813324932 , 6.79695572322 >

diff --git a/papers/eecs444_report/hard.tex b/papers/eecs444_report/hard.tex
@@ -8,7 +8,7 @@ \section{This is a Hard Problem}
 
 \begin{enumerate}[1.]
 \item
-  Reduction 3SAT -\textgreater{} AGSG {[}Shows AGSG at least NP-Hard{]}
+  Reduction 3SAT $\rightarrow$ AGSG {[}Shows AGSG at least NP-Hard{]}
 \item
   Proof of Poly-Time Verifiable Certificate {[}AGSG in NP{]}
 \end{enumerate}
@@ -18,7 +18,7 @@ \subsubsection{Decision Problem}
 
 Is the language generated by this grammar the empty language?
 
-\subsection{Reduction, 3SAT -\textgreater{} AGSG {[}Attribute Grammar
+\subsection{Reduction, 3SAT $\rightarrow$ AGSG {[}Attribute Grammar
 String Generation{]}}
 
 \subsubsection{Construct a grammar for the 3SAT instance}
@@ -53,7 +53,7 @@ \subsubsection{Construct a grammar for the 3SAT instance}
 xN -> TRUE
     | FALSE
     ;
-\engd{verbatim}
+\end{verbatim}
 With attributes which synthesize the values for each clause and with a
 condition which asserts that the whole expression is true.
 
@@ -202,7 +202,7 @@ \subsubsection{Construct a grammar for the 3SAT instance}
 O(\textbar{}C\textbar{} + \textbar{}V\textbar{}) where C = the clauses
 in 3SAT and V = the variables.
 
-\subsubsection{Show a solution for 3SAT ---\textgreater{} creates a
+\subsubsection{Show a solution for 3SAT $\rightarrow$ creates a
 parsable string for AGSG}
 
 In a solution for 3SAT all names (x1, x2, \ldots{} xN) always have
@@ -244,7 +244,7 @@ \subsubsection{Show a solution for 3SAT ---\textgreater{} creates a
 synthesize a value of \verb!True! on \verb!AndN.value! resulting in a
 parsable expression.
 
-\subsubsection{Show a string generated by the grammar ---\textgreater{}
+\subsubsection{Show a string generated by the grammar $\rightarrow$
 is a solution for 3SAT}
 
 In the above proof, we showed that the attributes on the grammar compute

diff --git a/papers/eecs444_report/mutation.tex b/papers/eecs444_report/mutation.tex
@@ -117,12 +117,13 @@ \subsection{Methods}
 functions. The first of these is the selection function. This function
 decides what subtree of the AST should be mutated. While a simplistic
 algorithm is currently employed, the system allows for the selection
-function to be easily swapped out. The second important function is
-the generator function. This function generates a subtree for the
-given non-terminal. This part in particular is where the CFG stat and
-attribute grammar implementations become very useful. While not
-currently implemented as such, these two tree generation techniques
-can be used for better mutation generation.
+function to be easily swapped out.
+
+The second important function is the generator function. This function
+generates a subtree for the given non-terminal. This part in
+particular is where the CFG stat and attribute grammar implementations
+become very useful. While not currently implemented as such, these two
+tree generation techniques can be used for better mutation generation.
 
 Other implementations of these two functions can easily be plugged
 in. In particular, Papadakis' approach utilizing graph coverage could