Skip to content

westphallm1/PyRD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PyRD

PyRD is a recursive descent parser generator with backtracking implemented in Python.

Installation

PyRD can be installed by running python setup.py install from the top level of the repository.

Command Line Usage

PyRD can be invoked as either a python module with python -m pyrd <input.grammar> <output.py> or as a standalone program with pyrdg <input.grammar> <output.py>. The syntax of <input.grammar> is described in the next section, and <output.py> will be generated as an importable Python file.

.grammar Syntax

.grammar files consist of 2 sections, separated by the marker %%. The first section consists of a list of the parser's production rules, and the second section consists of raw Python code that will be appended to the end of the generated parser. Prepending to the generated parser is currently not supported. Production rules are specified as follows:

production_id :: list of space-separated subparsers {raw python code}
    | another list of subparsers {raw python code}
    | more subparsers {raw python code};

Production rules are tried sequentially, so if the first rule belonging to an id succeeds the second will not be run. The values returned by any subparser in a production rule are populated by name into that production rule's python code, eg:
list_index :: list_ "[" index "]" { return list_[index] };
Two special parsers for string literals and regexes can be accessed with "characters-to-match" and /regex-to-match/, respectively. Their results can be accessed by indexing the special variable parsed in their production rule's python code, eg:
int_in_brackets :: "[" /-?[0-9]+/ "]" {return int(parsed[1])};
The python code between brackets is run whenever the production rule succeeds, and is used as that parser's value when called from another production rule. It is substituted as-is into the generated parser, so care must be taken to comply with Python's indentation rules.

An example parser that finds the maximum or minimum of a comma-separated list of integers might look like:

list     :: operator "(" items ")" {return ops[operator](items[::-1])};
operator :: /(min|max)/ {return parsed[0]};
items    :: num "," items {items.append(num); return items}
          | num {return [num]};
num      :: /-?[0-9]+/ {return int(parsed[0])};
%%
ops = {
  "min":min,
  "max":max
}
if __name__ == '__main__':
    parsed = List().parse(input())
    print(parsed)

The complete description for .grammar files can be found in examples/grammar.grammar

Examples

The examples/ directory contains several example grammars for PyRD. Examples can be compiled and tested as follows:

$ pyrdg examples/calc.grammar calc.py
Grammar parsed successfully.
$ echo "1 + 2 * 3" | python calc.py
7

Benchmarking

PyRD does not generate efficient parsers. The following parse times were achieved with a json parser generated by PyRD (examples/json.grammar):

File Size (KB) Runtime (PyRD) Runtime (builtin JSON parser)
10 0.120s < 0.1s
30 0.257s < 0.1s
70 0.548s < 0.1s
140 1.193s < 0.1s
400 5.521s < 0.1s
1000 44.8s 0.1s

Future Work

  • Support more flexible naming of production rule ids.
  • Document PyRD Classes.
  • Improve runtime by implementing some form of LL(k) prediction.

About

Recursive descent parser generator in Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published