Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Allowing grammar to filter out constants #76

Closed
davaya opened this issue May 31, 2020 · 2 comments
Closed

Feature Request: Allowing grammar to filter out constants #76

davaya opened this issue May 31, 2020 · 2 comments

Comments

@davaya
Copy link

davaya commented May 31, 2020

Arpeggio returns a parse tree containing every byte of the input text whether it is meaningful or not. Other grammar languages (e.g., https://lark-parser.readthedocs.io/en/latest/tree_construction/#shaping-the-tree) filter out constants in the grammar by default because they don't convey useful information. This is analogous to regular expressions that return match groups but don't return characters outside the groups. (https://docs.python.org/3/howto/regex.html#grouping)

Using https://github.com/textX/Arpeggio/tree/master/examples/simple as an example, if program.simple is modified to have a parameterlist with three symbols (function fak(n, m, x)), the resulting parse tree is:

simpleLanguage [1]: function fak
. parameterlist [13]: ( n , m , x )
. block [23]: {
. . statement [29]:
. . . ifstatement [29]: if (
. . . . expression [33]:
. . . . . operation [33]: n == 0 )
. . . . block [39]: {
. . . . . statement [79]:
. . . . . . returnstatement [79]: return
. . . . . . . expression [86]: 0 ; } else
. . . . block [100]: {
. . . . . statement [161]:
. . . . . . returnstatement [161]: return
. . . . . . . expression [168]:
. . . . . . . . operation [168]: n *
. . . . . . . . . functioncall [172]: fak (
. . . . . . . . . . expressionlist [176]:
. . . . . . . . . . . expression [176]:
. . . . . . . . . . . . operation [176]: n - 1 ) ; } ; }

parameterlist contains seven tokens (, n, ,, m, ,, x, and ), but the only information of interest is the three parameters n, m, and x; the punctuation is just noise bloating up the tree. Lark discards punctuation by default, but allows rules to preserve it by beginning them with an exclamation mark, noting:

Using the ! prefix is usually a "code smell", and may point to a flaw in your grammar design.

It is possible to use visitors to transform trees into a more useful format, but it shouldn't be necessary in common cases that can be controlled by the grammar.

Feature Request: Add the ability for grammar rules to filter unneeded tokens from the parse tree.

@davaya davaya changed the title Feature: grammar ignores constants Feature Request: Allowing grammar to filter out constants May 31, 2020
@igordejanovic
Copy link
Member

On branch https://github.com/textX/Arpeggio/tree/76-suppress-constant is the current work on this feature. Most pieces were already in place.

Here is what is implemented so far:

  • Added tree_str method for easy print of parse trees. See a61fe0f
  • Added class-level suppress attribute for easy override. See 2ff0b58
  • Added syntax_classes parser param for overriding special syntax forms. See 843b779

Please see the docs and tests in the commit above for how it is used. This needs more docs to be considered completed.

@igordejanovic
Copy link
Member

Implemented on the master branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants