Bug in `attempt`? #15

Open
ath opened this Issue Sep 20, 2012 · 10 comments

Projects

None yet

4 participants

@ath
ath commented Sep 20, 2012
;; works as expected
(run (attempt (p/char \space)) " ")
==> \space

;; expected nil
(run (attempt (p/char \space)) "")
==> Unexpected end of input at line: 1 column: 1
  [Thrown class java.lang.RuntimeException]

;; expected nil
(run (attempt (p/char \space)) "x")
==> Unexpected token 'x' at line: 1 column: 1
  [Thrown class java.lang.RuntimeException]

From the docstring “A parser that will attempt to parse p, and upon failure never consume any input” it was not clear to me, that attempt would throw an Exception.

@sjl
sjl commented Sep 20, 2012

The docstring is technically right, but I agree that it's a bit jarring at first.

attempt is for things like this:

(defparser hello []
  (char \H)
  (char \e)
  (char \l)
  (char \l)
  (char \o)
  (always :hello))

(defparser hi []
  (char \H)
  (char \i)
  (always :hi))

(run (choice (hello)
             (hi))
     "Hello!")
; :hello

(run (choice (hello)
             (hi))
     "Hi!")
; RuntimeException...

(run (choice (attempt (hello))
             (hi))
     "Hi!")
; :hi

The first run works because it just parses Hello.

The second run fails because the first one consumes and parses the H just fine,
then errors out on the e. choice moves to the next parser on the list, but
the H has already been consumed, so it tries to parse "i!" with hi, which
fails.

Wrapping the hello parser in an attempt lets you ignore the fact that it
consumed input.

@ath
ath commented Sep 20, 2012

Why does choice consume anything if it doesn’t match? I would have expected some kind of backtracking. In the second example it can match the H for Hi!, but then fails. So it backtracks and tries again with (hi), which is successful.

@ath
ath commented Sep 20, 2012

And I am still not sure why attempt should throw exceptions upon some failures, while on others it doesn’t. In your example I would assume from my tests that (attempt (hello)) should throw an Exception, because it can’t parse the input.

@sjl
sjl commented Sep 20, 2012

Why does choice consume anything if it doesn’t match? I would have expected
some kind of backtracking. In the second example it can match the H for Hi!,
but then fails. So it backtracks and tries again with (hi), which is
successful.

I'll leave this for @youngnh to answer since I'm still a bit shaky on this.

I am still not sure why attempt should throw exceptions upon some failures,
while on others it doesn’t. In your example I would assume from my tests that
(attempt (hello)) should throw an Exception, because it can’t parse the
input.

Okay, first, attempt never throws anything. The only place the
RuntimeException is thrown (aside from the special case of sanity-checking many
and zero-element parsers) is in run:

(defn run
  [p input]
  (let [result (run-parser p ...)]
    (condp instance? result
      Ok (:item result)
      Err (throw (RuntimeException. ^String (:errmsg result))))))

Basically, if the parser you give to run results in an Ok, the result is
unpacked and returned normally. If it returns an error, the exception is
thrown.

So:

(def hello []
  (string "hello")
  (always :hello))

(def hi []
  (string "hi")
  (always :hi))

First case:

(run (hello) "hello")
; :hello

Matches fine, returns result. All good. Next:

(run (hello) "hi")
; RuntimeException

The (hello) returned an error. run threw a runtime exception.

(run (attempt (hello)) "hi")
; RuntimeException

The (attempt (hello)) returned an error. run threw a runtime exception.

(run (choice (hello)
             (hi))
     "hi")
; RuntimeException

The (hello) returned an error. It also consumed the "h" before it got to the
thing that caused the error. (hi) tried to parse "i" and returned an error,
so (choice ...) returned an error. run threw a runtime exception. This is
the part I'm a bit shaky on and need @youngnh to explain further.

(run (choice (attempt (hello))
             (hi))
     "hi")
; :hi

The (attempt (hello)) returned an error, but did not consume any input.
(hi) parsed the "hi" and returned Ok. run unwraps the result and returns
it.

@ath
ath commented Sep 20, 2012

Okay good, thanks for the clarification. It was run who threw the Exception.
So now it is clear why my example above resulted in one. The attempt tried to parse it, but was unsuccessful. Now run ran out of options and couldn’t parse my "" or "x" inputs.

What is the difference between these two?

(run (either (char \x) (char \y)) "y")
(run (either (attempt (char \x)) (char \y)) "y")

So, attempt is the “optional” operator. It tries to match if possible, but if it can’t match, then it consumes nothing and is effectively a null-operation. Is that correct?

@sjl
sjl commented Sep 20, 2012

Okay, after rewatching the talk to refresh my memory of the internals, I think
it's a tiny bit different than I mentioned above.

(run (choice (hello)
             (hi))
     "hi")
; RuntimeException

In that example, hello effectively returned "error, and I consumed a value".
choice then immediately returned a value without even trying (hi), becuase
it know that it has to try each option at the same spot, and now that it's moved
forward it can't do that any more.

Okay, now in your last example:

(run (either (char \x) (char \y)) "y")

This first tries the (char \x) parser. It returns an error, but (importantly)
it did not consume any input. So choice is still free to try the next option
at the correct location.

If you chained two characters together and the first one matched but the second
one didn't, then it would have consumed input.

But because (char \whatever) only tries to parse a single thing, on its own
it never consumes input unless it succeeds.

(run (either (attempt (char \x)) (char \y)) "y")

The attempt here is really a no-op. It's not going to hurt anything (maybe
a tiny performance hit of course), but since it's wrapping something that can't
possibly consume input if it fails
, it's not necessary.

Back to my example:

(defparser hello []
  (char \H)
  (char \e)
  (char \l)
  (char \l)
  (char \o)
  (always :hello))

Now defparser wraps all of its body in a >> for you. So this is exactly
equivalent (at least for this question):

(def hello
  (>> 
    (char \H)
    (char \e)
    (char \l)
    (char \l)
    (char \o)
    (always :hello)))

Because this is parsing more than one thing at a time, and consumes/parses each
bit before moving onto the next, and can fail anywhere along the way, it may end
up consuming input. That's why we need the attempt in a choice.

Now, with all that said, I think it would be a lot more intuitive to use if
(choice foo bar) wrapped its arguments in (attempt)s for you. I don't see
any downside to that and it'd make things a lot easier to use.

@ath
ath commented Sep 20, 2012

Yes agreed, if it automatically wrapped its args, then it would give the backtracking effect.

@youngnh
Owner
youngnh commented Sep 21, 2012

Backtracking is a common request and common source of confusion. The Parsatron doesn't do it by default because Haskell's parsec doesn't do it and this lib started out as a straight port. As things currently stand, users have to manually implement backtracking using parsers like attempt and lookahead just as you worked out above.

There is something natural that matches people's expectations about backtracking. Its the way that new users of the library expect things to work, so there is an argument there that The Parsatron should implement it and people's programs will work as they want them to without muss or fuss.

At the same time backtracking can incur very high costs in the case of long parse paths that fail close to the end, in highly branched parsers, or in parsers that attempt a match multiple times (like many). These problems are subtle and if your parser is slow, require knowledge of the implementation in order to diagnose.

Right now, I think I'm leaning towards creating a second namespace in The Parsatron with backtracking equivalents of the functions in the current namespace. So that users may switch between implementations simply by changing the :use or :require definitions at the top of their file.

@ath
ath commented Sep 21, 2012

Maybe this is the right way to approach this. A new NS that can be used for testing. People could use the more comfortable one and see how it works out, and in case something is not going fast enough, users with more insight can switch back to the more low-levelish implementation.
But I guess there is still room for more efficiency anyway. And that may lead to some uglieness…
As soon we have +/- copy pasted code, it may require changes at two places, if bugfixes or speed improvements are being applied.
Right now though I would still prefer to have auto-attempting versions of either/choice.

@mullr
mullr commented Aug 5, 2013

I've just run into the backtracking problem myself. I think the issue is that it's natural to expect the combinators to behave opaquely; having one branch of a choice fail but still consume input is very strange.

One option is to add something like choice* from #26 to the library; this would make the distinction clear and probably ease this bump for new users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment