Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewind parser #33

Merged
merged 4 commits into from Nov 25, 2021
Merged

Rewind parser #33

merged 4 commits into from Nov 25, 2021

Conversation

ryo33
Copy link
Contributor

@ryo33 ryo33 commented Nov 24, 2021

Hi, continuously I perfectly enjoying parsing with this amazing crate. Today I introduce lookahead combinator.
I don't believe that this should be merged, but this is a little bit convenience for some case. Also, I am not confident on the naming and design of this new combinator.

Motivation

I want to parse something like this;

a b c - d

with

word
    .repeated()
    .then(word
        .then_ignore(just('-'))
        .then(word))

But this fails at -.

So I introduce lookahead and change the parser like below.

word
    .then_ignore(none_of("-".chars()).lookahead())
    .repeated()
    .then(word
        .then_ignore(just('-'))
        .then(word))

Compare to other designs

I choose the one I think it's most fits to chumsky's APIs.

word.lookahead(none_of("-".chars()))

This is like very natural, but always ignores the outputs of the lookahead parser.
In some case, we may want not to ignore, such as infix attribute syntax that have influence to the both side.

word.then_ignore(lookahead(none_of("-".chars())))

This is also natural, and allows us to choose whether ignore the output or not.
However, this is odd in chumsky's conventions.

@zesterer
Copy link
Owner

Hi, thanks for the PR! Just so I can get a few things straight about exactly what problems you encountered:

But this fails at -.

This doesn't look like a pattern that should fail, unless word also accepted - as an input. Is this the case?

The PR seems like the parse succeeds on valid input, but reverts to the start of this input. Is this the intended behaviour?

@ryo33
Copy link
Contributor Author

ryo33 commented Nov 24, 2021

@zesterer Thanks for the review.

This doesn't look like a pattern that should fail, unless word also accepted - as an input. Is this the case?

I forgot to write that the word parser only parses alphanumerics.

The PR seems like the parse succeeds on valid input, but reverts to the start of this input. Is this the intended behavior?

Yes, this is intended.

Certainly, It looks like it could be a success, and the introduced parser is unnecessary.
I may do something wrong or run into a pitfall. I'm going to reproduce the condition accurately.

Comment on lines 1138 to 1150
#[test]
// TODO: this fails
fn reproduce_repeated_with_infix() {
let a = just::<_, Simple<char>>('a');
let parser = a
.clone()
.repeated()
.then(a.clone().then_ignore(just('-')).then(a))
.then_ignore(end());

assert_eq!(parser.parse("aaa-a"), Ok((vec!['a', 'a'], ('a', 'a'))));
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added that reproduces the condition I met and I want to solve.

@zesterer
Copy link
Owner

What I'm worried about is the extent to which this combinator generalises and the ergonomics of it. For me, it took me quite a while to understand exactly what the code in your example was actually doing.

I think it's probably the case that this case of overzealous parsing is only hit when a parser consumes too much of the input to the right of its intended pattern. If this is the case, may I recommend a rename to with_stop and with a slight change in logic?

word
    .with_stop(word)
    .repeated()
    .then(word.then_ignore(just('-')).then(word))

@ryo33
Copy link
Contributor Author

ryo33 commented Nov 25, 2021

I had a little survey on this, the lookahead combinator is a common use-case, I think.
Parsec has a lookahead combinator, so combine crate also has it (combine refers to parsec on its API).
There is also a similar combinator try. Contrary to lookahead, try make a parser not to consume inputs when failed.
Therefore all chumsky parsers are already try. But there are no combinators that make a parser not consume inputs even when successful.

I roughly agree the lookahead API I suggested is difficult to understand. How about parser.rewind() or parser.peek()
I think parser.lookahead() is intuitive if get used to, like parser.ignored(). Also, I prefer lookahead because it is a common term.

Also, How about word.not_followed_by(just('-'))?

@zesterer
Copy link
Owner

rewind seems like a suitable name to me. We'll have to make sure that the documentation for these functions is good because their use isn't particularly intuitive.

Also, How about word.not_followed_by(just('-'))

We already have .then_ignore(..), so how about .then_not(..)?

@ryo33 ryo33 changed the title Lookahead parser Rewind parser Nov 25, 2021
Comment on lines +956 to +972
/// Parse something and then rewind the state so that it only looks ahead and does not consume input.
/// Chumsky's parsers are always rewind the state when they fail.
/// But this combinator makes a parsers to rewind the state whether it succeeds or fails.
/// A typical use-case of this is that you want to parse something which is not followed by something else.
///
/// # Examples
///
/// ```
/// # use chumsky::prelude::*;
/// let just_numbers = text::digits::<_, Simple<char>>(10)
/// .padded()
/// .then_ignore(none_of("+-*/".chars()).rewind())
/// .separated_by(just(','));
/// // 3 is not parsed because it's followed by '+'.
/// assert_eq!(just_numbers.parse("1, 2, 3 + 4"), Ok(vec!["1".to_string(), "2".to_string()]));
/// ```
fn rewind(self) -> Rewind<Self>
Copy link
Contributor Author

@ryo33 ryo33 Nov 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it to rewind, and made the documentation slightly better.


#[test]
// TODO: this fails
fn reproduce_repeated_with_infix() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need a #[should_fail]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I deleted the test in my latest commit.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently not.

@zesterer
Copy link
Owner

Thanks for your work on this PR!

@zesterer zesterer merged commit a8ddc40 into zesterer:master Nov 25, 2021
@ryo33
Copy link
Contributor Author

ryo33 commented Nov 25, 2021

Thank you! This makes my programming language more powerful 💪

@ryo33 ryo33 deleted the feature/lookahead branch November 25, 2021 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants