Skip to content
No description, website, or topics provided.
Python C Makefile Shell
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin
pygmalion
subjects
.envrc
.gitignore
LICENSE
Makefile
README.md
REPORT.md
requirements.txt
sample-free-learning-of-input-grammars.pdf

README.md

Pygmalion

Requirements

Pygmalion dependes on a number of packages, all of which can be installed using the following command

$ make req

Note that we have strict requirements for Python. That is, this is tested only on Python 3.6.5 It may work on later versions, but will not work on previous versions because we rely on constructs introduced in 3.6.

Subjects

We have the following subjects under the ./subjects directory

  • hello.py
  • array.py
  • microjson.py
  • urljava.py
  • mathexpr.py
  • expr.py
  • number.py

Complete end to end for subject hello.py

First generate inputs, derive and evaluate our grammar

$ make xeval.hello

Next, get the human readable grammar

$ make xbnf.hello

Stages

We have the following stages

  • chain Generates the initial inputs using PyChains

  • trace Runs the generated inputs through a python frame tracer. This is the only python specific part. Unfortunately, because we mess with settrace, debugging is not available. Hence the next phase is separated out.

  • track Evaluate the dumped frames to assertain scopes and causality rules. We essentially retrieve the input stack information from the dumped frames.

  • mine Mine the parse tree from the stack frames. These are still input specific (hence the parse tree)

  • infer Generate the context free grammar by merging the parse trees. At this point, we nolonger can distinguish separate inputs.

  • refine Try to produce human readable grammar

  • fuzz Generate output from the infered and refined grammar.

  • eval Use the outputs generated and find how many are valid, and the amount of coverage obtained

  • bnf This is not a stage for grammar evaluation, but can be used to generate Human readable grammars from the refined grammar (depends on refine)

Each stage can be invoked by x.. For example, for complete evaluation of microjson.py, the following command would be used

$ make xeval.microjson

On the other hand, if only the human readable grammar is neceassary, the following command is used

$ make xbnf.microjson

Generating initial inputs using PyChains

The following command generates the initial inputs using PyChains for hello.py

$ make xchain.hello

The result is placed in .pickled/hello.py.chain and can be converted to readable ASCII by

$ ./bin/showpickle.py .pickled/hello.py.chain

A number of environment variables are used to control the Pygmalion

  • MY_RP Used to indicate how to proceed when an input is accepted. Some subjects such as urlpy.py and urljava.py allows single character inputs that should be extended to produce larger inputs. Default is 1.0. Choose MY_RP=0.1 for urljava.py for reasonable URLs

  • NINPUT The number of inputs that the Chain should produce before stopping. Default is 10.

  • R The random number seed. The default is 0.

  • NOCTRL Whether to produce characters such as \t\b\f\x012 which are not part of the list string.ascii_letters + string.digits + string.punctuation

  • NO_LOG (1) If set to 0, we get more informative and verbose output (which slows down the program quite a bit).

  • python3 The python interpreter used

  • pip3 The pip installer command

The configuration can be finetuned further by modifing these pygmalion.confg variables and pychains.config variables

Pygmalion config variables

  • config.Track_Params (True) Whether to track function parameters or not

  • config.Track_Vars (True) Whether to track local variables or not

  • config.Track_Return (False) Should we insert a special return variable from each function?

  • config.Ignore_Lambda (True) Strip out noise from lambda expressions

  • config.Swap_Eclipsing_keys (True) When we find a smaller key already contains a chunk (usually a peek) of a later variable, what should we do with the smaller variable? With enabled, we simply swap the order of these two variables in causality

  • config.Strip_Peek (True) Related to above -- If we detect a swap, rather than swap, simply discard the smaller (earlier) variable.

  • config.Prevent_Deep_Stack_Modification (False) Only replace things at a lower height with something at higher height. It is useful mainly for returned values that may be smaller than an earlier variable deeper in the call scope.

Pychain config variables

  • config.Wide_Trigger (10) Trigger wide search when this number of similar comparisons is done consecutively

  • config.Deep_Trigger (10) Trigger deep search when this number of unique states is reached for wide search.

Examples

Our evaluations were using these command lines.

make xeval.urljava MY_RP=0.1 NINPUT=100 NOUT=1000
make xeval.mathexpr MY_RP=0.1 NINPUT=100 NOUT=1000
make xeval.microjson NINPUT=100 NOUT=1000

Example: Math expression with return probability set to 0.1

make xeval.mathexpr MY_RP=0.1

Example: Microjson with logging set.

make xeval.microjson NO_LOG=0

Example: Print the bnf for URL

make xbnf.urljava NINPUT=100
You can’t perform that action at this time.