Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to make choice prefer the first parser? #181

Closed
msrd0 opened this issue Aug 6, 2022 · 16 comments
Closed

How to make choice prefer the first parser? #181

msrd0 opened this issue Aug 6, 2022 · 16 comments

Comments

@msrd0
Copy link

msrd0 commented Aug 6, 2022

I'm trying to parse a simple tag-like language, for example foo{{x}}bar would be a valid input. I tried using choice to parse this, like so:

let tag = any().map(Token::Verbatim).repeated();
choice((
    tag.delimited_by(just("{{"), just("}}")).map(Token::Tag),
    any().map(Token::Verbatim)
))

However, this also parses {{ as two verbatim characters, instead of as a tag delimiter. How can I make choice prefer the first parser?

@zesterer
Copy link
Owner

zesterer commented Aug 6, 2022

choice (and or) will always try the parsers in the order that they appear.

The problem here is that any().map(...).repeated() will accept any character, any number of times, and hence will always succeed and consume all remaining input in the process.

You probably want to use none_of('}').map(Token::Verbatim).repeated() instead, or the take_until parser if you want to allow singular braces.

@msrd0
Copy link
Author

msrd0 commented Aug 6, 2022

Oh ok I see - but that's a problem. If a {{ appears in the input, I must ensure it always takes the first parser, even if there's no matching }} (that would be an error, not a valid input).

@zesterer
Copy link
Owner

zesterer commented Aug 6, 2022

Do you want a {{ without a corresponding }} to be an error?

@msrd0
Copy link
Author

msrd0 commented Aug 6, 2022

Yes, exactly. Like {{ is always an opening delimiter, and no following }} should be an error as missing closing delimiter

@zesterer
Copy link
Owner

zesterer commented Aug 6, 2022

just("{{").ignore_then(take_until(just("}}"))) is probably what you want in that case.

@msrd0
Copy link
Author

msrd0 commented Aug 6, 2022

Ok so I tried this instead of delimited_by:

fn group<T>(
	begin: &'static str,
	end: &'static str,
	parser: impl Parser<char, T, Error = Simple<char>>
) -> impl Parser<char, T, Error = Simple<char>> {
	just(begin).ignore_then(parser).then_ignore(just(end))
}

Calling it like group("{{", "}}", none_of(['}'])) works but trying to do group("{{", "}}", just("foo")) on input {{bar}} still produces verbatim except I need it to fail parsing.

@zesterer
Copy link
Owner

zesterer commented Aug 7, 2022

I'm not really sure what to say. The code you gave wouldn't parse {{bar}} at all, so the problem must be elsewhere in your parser.

@msrd0
Copy link
Author

msrd0 commented Aug 7, 2022

Ok sorry let me give you a full example (the code for Tag will eventually get slightly more complicated but this should do for now):

use chumsky::prelude::*;

fn group<T>(
	begin: &'static str,
	end: &'static str,
	parser: impl Parser<char, T, Error = Simple<char>>
) -> impl Parser<char, T, Error = Simple<char>> {
	just(begin).ignore_then(parser).then_ignore(just(end))
}

#[derive(Debug, Eq, PartialEq)]
enum Token {
	Tag(Tag),
	Verbatim(char)
}

impl Token {
	fn lexer() -> impl Parser<char, Self, Error = Simple<char>> {
		let tag = group("{{", "}}", Tag::lexer()).map(Self::Tag);
		let verbatim = any().map(Self::Verbatim);
		choice((tag, verbatim))
	}
}

#[derive(Debug, Eq, PartialEq)]
struct Tag;

impl Tag {
	fn lexer() -> impl Parser<char, Self, Error = Simple<char>> {
		just("foo").map(|_| Self)
	}
}

fn lexer() -> impl Parser<char, Vec<Token>, Error = Simple<char>> {
	Token::lexer().repeated().then_ignore(end())
}

#[test]
fn parse_tag_foo() {
	let tokens = lexer().parse("{{foo}}").unwrap();
	assert_eq!(&tokens, &[Token::Tag(Tag)]);
}

#[test]
#[should_panic]
fn parse_tag_bar() {
	lexer().parse("{{bar}}").unwrap();
}

#[test]
#[should_panic]
fn parse_tag_foo_unclosed() {
	lexer().parse("{{foo").unwrap();
}

This parses everything that doesn't match all of the parsers of the first choice as the second choice, i.e. accepting all input:

running 3 tests
test parse_tag_bar - should panic ... FAILED
test parse_tag_foo ... ok
test parse_tag_foo_unclosed - should panic ... FAILED

What I want/need/.. (possibly there's another way to express this that I'm not aware of) is to enter the first choice if its first parser matches (the just("{{") one) and then stick with that choice even if later on an error occurs.

@zesterer
Copy link
Owner

zesterer commented Aug 7, 2022

let verbatim = any().map(Self::Verbatim);

This parser accepts any character. It appears in a .repeated() in lexer. Ergo, the parser will always succeed because it can always just repeatedly use verbatim to consume everything.

I guess what you really want is a way to do

any().but_not(just("{{"))

Right now, but_not doesn't exist: but this case (and a few others I've seen recently) are convincing me that maybe it should. I'll try to look into this over this week.

@msrd0
Copy link
Author

msrd0 commented Aug 7, 2022

Yes, I see why choice doesn't work for my case. A but_not parser would certainly work here, eventhough an API like the following would appear more "natural" from a user's perspective:

branch(just("{{"))
    .then(Tag::lexer().then_ignore(just("}}")).map(Self::Tag))
    .or_branch(...)
    .or_else(any().map(Self::Verbatim))

I have no idea how hard this would be to implement or if that's at all feasible.

@zesterer
Copy link
Owner

zesterer commented Aug 7, 2022

The API you suggest would just do the same think as your current parser though. It's any() that's the issue, it needs to know that it's not allowed to parse {{.

@msrd0
Copy link
Author

msrd0 commented Aug 7, 2022

Also, just from a quick look any().but_not(just("{{")) seems a little bit contradicting, as any matches one char and the just matches two chars, but I guess if but_not existed I could modify the parser to accept more than one char at a time if that was required, and if but_not would ensure that any() stopped parsing before it encountered the {{.

@zesterer
Copy link
Owner

zesterer commented Aug 7, 2022

I'm wondering whether I can generalise this a bit further, even, to a.and_is(b.not()). You'd read this as "parses a, if it also parses as not being a b".

@0x7FFFFFFFFFFFFFFF
Copy link

I'm wondering whether I can generalise this a bit further, even, to a.and_is(b.not()). You'd read this as "parses a, if it also parses as not being a b".

Why not use a.and_is_not(b)? It sounds more natural to me.

@zesterer
Copy link
Owner

Because having separate combinators for and_is and not is more composable, allowing more things to be expressed. For example, none_of(...) now just becomes any_of(...).not().

@CraftSpider
Copy link
Collaborator

With rewind and not released, is this issue complete? Plus I believe and_is on zero copy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants