Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lookahead not working in OrderedChoice #96

Closed
vprat opened this issue Aug 1, 2022 · 5 comments
Closed

Lookahead not working in OrderedChoice #96

vprat opened this issue Aug 1, 2022 · 5 comments

Comments

@vprat
Copy link

vprat commented Aug 1, 2022

Because And and Or return nothing when parsed, they are ignored by OrderedChoice.

>>> from arpeggio import *
>>> def expr(): return [ Not("a"), "a" ], Optional("b")
... 
>>> p = ParserPython(expr)
>>> p.debug = True
>>> p.parse("b")
>> Matching rule expr=Sequence at position 0 => *b
   >> Matching rule OrderedChoice in expr at position 0 => *b
      >> Matching rule Not in expr at position 0 => *b
         ?? Try match rule StrMatch(a) in expr at position 0 => *b
         -- No match 'a' at 0 => '*b*'
      <<- Not matched rule Not in expr at position 0 => *b
      ?? Try match rule StrMatch(a) in expr at position 0 => *b
      -- No match 'a' at 0 => '*b*'
   <<- Not matched rule OrderedChoice in expr at position 0 => *b
<<- Not matched rule expr=Sequence in expr at position 0 => *b
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "arpeggio/__init__.py", line 1525, in parse
    self.parse_tree = self._parse()
  File "arpeggio/__init__.py", line 1785, in _parse
    return self.parser_model.parse(self)
  File "arpeggio/__init__.py", line 300, in parse
    result = self._parse(parser)
  File "arpeggio/__init__.py", line 379, in _parse
    result = e.parse(parser)
  File "arpeggio/__init__.py", line 300, in parse
    result = self._parse(parser)
  File "arpeggio/__init__.py", line 432, in _parse
    parser._nm_raise(self, c_pos, parser)
  File "arpeggio/__init__.py", line 1727, in _nm_raise
    raise self.nm
  File "arpeggio/__init__.py", line 418, in _parse
    result = e.parse(parser)
  File "arpeggio/__init__.py", line 798, in parse
    result = self._parse(parser)
  File "arpeggio/__init__.py", line 907, in _parse
    parser._nm_raise(self, c_pos, parser)
  File "arpeggio/__init__.py", line 1727, in _nm_raise
    raise self.nm
  File "arpeggio/__init__.py", line 670, in _parse
    e.parse(parser)
  File "arpeggio/__init__.py", line 798, in parse
    result = self._parse(parser)
  File "arpeggio/__init__.py", line 907, in _parse
    parser._nm_raise(self, c_pos, parser)
  File "arpeggio/__init__.py", line 1727, in _nm_raise
    raise self.nm
arpeggio.NoMatch: Expected 'a' at position (1, 1) => '*b'.
@igordejanovic
Copy link
Member

Not and And are syntactic predicates. I don't think that they should be used as a sole expressions but as a part of a sequence. Nevertheless, at least what we could do is to report error during parser construction if predicates are used in this way.

@vprat
Copy link
Author

vprat commented Aug 2, 2022

For comparison, Ohm says lookahead is usually used as part of a sequence, but it is not mandatory.

Indeed, the string "b" is parsed by the following grammar in the Ohm editor:

Grammar {
    rule = notaora "b"?
    notaora = ~"a" -- nota
    | "a" -- a
}

@igordejanovic
Copy link
Member

Thanks for the input. I'll investigate. BTW, do you have any real-world example where you would like to use predicates outside of a sequence? I just can't figure out where it would be useful.

@vprat
Copy link
Author

vprat commented Aug 2, 2022

We had a grammar where this occurred.
Eventually, we managed to rewrite it to avoid the problem, but I am not sure this is always possible.

igordejanovic added a commit that referenced this issue Apr 1, 2023
This change is backward incompatible. It removes a sort of "soft
failure" used by repetition (if inner expression returns None) to avoid
infinite looping on nested non-consuming matches.

This has lead to some subtle problems and inconsistencies which confused
users.

See for example:
#96

But, OTOT, it is now possible to create a grammar which will loop
endlessly by having a possible non-consuming (e.g. regex which can match
empty, Option, ZeroOrMore...) to be nested inside
repetitions (ZeroOrMore, OneOrMore). The parser will loop as the inner
expression will succeed without consuming input bringing the parser in
the same state over and over.
@igordejanovic
Copy link
Member

Fixed on the master branch.

igordejanovic referenced this issue Apr 1, 2023
If a branch in ordered choice has a potentially non-consuming successful
alternative (Optional, ZeroOrMore, Not, And), it will succeed if
alternative succeeds and thus will not try further choices.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants