Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect behavior when parsing unordered group in clean PEG #43

Closed
kdahlhaus opened this issue May 9, 2018 · 7 comments
Closed

incorrect behavior when parsing unordered group in clean PEG #43

kdahlhaus opened this issue May 9, 2018 · 7 comments
Assignees
Labels

Comments

@kdahlhaus
Copy link

kdahlhaus commented May 9, 2018

The unordered group will not be parsed correctly in a multi-line grammar unless a backslash is the last character on the line in the grammar.

This is how you would expect to write the grammar (no backslash):

print  ParserPEG("""
  letters = "{" ("a" "b")#  "}"
""", "letters").parse(""" { b a } """)

This incorrectly throws an exception: arpeggio.NoMatch: Expected 'a' at position (1, 4) => ' { *b a } '.

This example adds a backslash as the last char in the line fixes the parsing:

print ParserPEG("""
  letters = "{" ("a" "b")#  "}" \
""", "letters").parse(""" { b a } """) 

This correctly prints: { | b | a | }

I'm using Arpeggio 1.7.1 installed from pip under Python 2.7 in Windows.

@kdahlhaus
Copy link
Author

kdahlhaus commented May 9, 2018

Here's a unit test that shows the problem and how to fix it for a single line by re-ordering the grammar. I hope that fix helps to identify the problem. Just rename back to python extension.

test_unordered_group.py.txt

@igordejanovic igordejanovic self-assigned this May 10, 2018
@igordejanovic
Copy link
Member

Thanks for reporting. I've verified it. It is a bug that seems to affect only cleanpeg notation.

igordejanovic added a commit that referenced this issue May 10, 2018
Backward incompatible change. Line comments changed from `# ...` to
`//...`.
@igordejanovic
Copy link
Member

The problem was a conflict between unordered group operator symbol # and the line comment in cleanpeg notation which unfortunately used the same symbol. I decided for an easiest and most pragmatic solution although it introduces a slight backward incompatible change for cleanpeg grammars. The line comment pattern is now changed from #... to //... as in regular peg notation. The fix in on the master branch so you can test and report back if you have any more problems with this.

@codeyash
Copy link

codeyash commented May 10, 2018

or group operator can be double # like ## or any other operator not used yet..like ~, ^ etc

'#' is good operator for comment ..my thought :)

@igordejanovic
Copy link
Member

Yeah, we could change it but that would be backward incompatible change that is harder to trace down in complex grammars, and it would make cleanpeg syntax more different from regular peg which is a bad thing IMHO. This change in comment makes cleanpeg actually more similar to regular peg while not reducing readability of cleanpeg. textX also uses # for unordered groups which is another good reason to follow that notation.

Anyway, Arpeggio makes it relatively easy to make your own grammar language notation by following how it is done for peg/cleanpeg :)

@codeyash
Copy link

Okay, Yea agreed!

@kdahlhaus
Copy link
Author

That fix passes my unit test and in my project. (FYI - the version is still at 1.7.1)

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants