Pygmalion dependes on a number of packages, all of which can be installed using the following command
$ make req
Note that we have strict requirements for Python. That is, this is tested only on Python 3.6.5 It may work on later versions, but will not work on previous versions because we rely on constructs introduced in 3.6.
We have the following subjects under the ./subjects directory
Complete end to end for subject hello.py
First generate inputs, derive and evaluate our grammar
$ make xeval.hello
Next, get the human readable grammar
$ make xbnf.hello
We have the following stages
chain Generates the initial inputs using PyChains
trace Runs the generated inputs through a python frame tracer. This is the only python specific part. Unfortunately, because we mess with settrace, debugging is not available. Hence the next phase is separated out.
track Evaluate the dumped frames to assertain scopes and causality rules. We essentially retrieve the input stack information from the dumped frames.
mine Mine the parse tree from the stack frames. These are still input specific (hence the parse tree)
infer Generate the context free grammar by merging the parse trees. At this point, we nolonger can distinguish separate inputs.
refine Try to produce human readable grammar
fuzz Generate output from the infered and refined grammar.
eval Use the outputs generated and find how many are valid, and the amount of coverage obtained
bnf This is not a stage for grammar evaluation, but can be used to generate Human readable grammars from the refined grammar (depends on refine)
Each stage can be invoked by x.. For example, for complete evaluation of microjson.py, the following command would be used
$ make xeval.microjson
On the other hand, if only the human readable grammar is neceassary, the following command is used
$ make xbnf.microjson
Generating initial inputs using PyChains
The following command generates the initial inputs using PyChains for hello.py
$ make xchain.hello
The result is placed in .pickled/hello.py.chain and can be converted to readable ASCII by
$ ./bin/showpickle.py .pickled/hello.py.chain
A number of environment variables are used to control the Pygmalion
MY_RP Used to indicate how to proceed when an input is accepted. Some subjects such as urlpy.py and urljava.py allows single character inputs that should be extended to produce larger inputs. Default is 1.0. Choose MY_RP=0.1 for urljava.py for reasonable URLs
NINPUT The number of inputs that the Chain should produce before stopping. Default is 10.
R The random number seed. The default is 0.
NOCTRL Whether to produce characters such as \t\b\f\x012 which are not part of the list string.ascii_letters + string.digits + string.punctuation
NO_LOG (1) If set to
0, we get more informative and verbose output (which slows down the program quite a bit).
python3 The python interpreter used
pip3 The pip installer command
The configuration can be finetuned further by modifing these
Pygmalion config variables
config.Track_Params (True) Whether to track function parameters or not
config.Track_Vars (True) Whether to track local variables or not
config.Track_Return (False) Should we insert a special return variable from each function?
config.Ignore_Lambda (True) Strip out noise from lambda expressions
config.Swap_Eclipsing_keys (True) When we find a smaller key already contains a chunk (usually a peek) of a later variable, what should we do with the smaller variable? With enabled, we simply swap the order of these two variables in causality
config.Strip_Peek (True) Related to above -- If we detect a swap, rather than swap, simply discard the smaller (earlier) variable.
config.Prevent_Deep_Stack_Modification (False) Only replace things at a lower height with something at higher height. It is useful mainly for returned values that may be smaller than an earlier variable deeper in the call scope.
Pychain config variables
config.Wide_Trigger (10) Trigger wide search when this number of similar comparisons is done consecutively
config.Deep_Trigger (10) Trigger deep search when this number of unique states is reached for wide search.
Our evaluations were using these command lines.
make xeval.urljava MY_RP=0.1 NINPUT=100 NOUT=1000 make xeval.mathexpr MY_RP=0.1 NINPUT=100 NOUT=1000 make xeval.microjson NINPUT=100 NOUT=1000
Example: Math expression with return probability set to 0.1
make xeval.mathexpr MY_RP=0.1
Example: Microjson with logging set.
make xeval.microjson NO_LOG=0
Example: Print the bnf for URL
make xbnf.urljava NINPUT=100