Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About map_slice in zero-copy #243

Closed
bew opened this issue Jan 3, 2023 · 9 comments
Closed

About map_slice in zero-copy #243

bew opened this issue Jan 3, 2023 · 9 comments

Comments

@bew
Copy link
Contributor

bew commented Jan 3, 2023

Extracted from #240 (comment)

I'd be interested in an explanation of what map_slice does, and how/when to actually use it.
I saw one example using it as ...map_slice(|x| x) but it's not clear to me what it is for eyes

Reply from @zesterer:

map_slice is a zero-copy thing, it gives you the slice of the input that the parser covered. For example:

let digits = one_of("0123456789").repeated().map_slice(|x| x);

assert_eq!(digits.parse("743").unwrap(), "743");

Because the parser above is just pulling a slice from the input, no allocation needs to occur! On master, this would require collecting each individual digit into a new String.

Continuing this question/discussion here to avoid 'polluting' that PR

With this info and after digging into the sources, I think I fully understand it now.. and I have to say: IT'S BRILLIANT!

The way you implemented the zero-copy system, with different modes, using check/emit parsing as needed for the different parsers is JUST PERFECT ❤️ ❤️
map_slice (currently undocumented) seems to be one of the core combinator of the zero-copy system, I think it'll be important to emphasis it's importance! (and how it works)


Idea: make it the default for sliceable repeated combinator (and others) ?

Thinking a bit more, have you considered making it more 'visible' (increase the chance people will use it 'by default') by having it integrated into the repeated combinator? (and others that outputs 'lists' of X)
The only consideration I see is that repeated can be used on inputs that are not SliceInput.

👉 Would it be possible(/a good idea?) to make a dedicated repeated implementation that is automatically using map_slice(|x| x) when the input is a SliceInput ?

Question: Handling of ignore_then ?

Can't manage to test it locally (can't find how to type the parser correctly..), so this is hypothesis only

I'm wondering about the handling of something like this:

let hex_digits = one_of("0123456789ABCDEF").repeated().at_least(1);
let verbose_hex = just("0x").ignore_then(hex_digits);

// later
let only_hex_digits = verbose_hex.map_slice(|x| x);

According to the implementation:

fn go<M: Mode>(&self, inp: &mut InputRef<'a, '_, I, E, S>) -> PResult<M, Self::Output, E> {
let before = inp.save();
self.parser.go::<Check>(inp)?;
let after = inp.save();
Ok(M::bind(|| (self.mapper)(inp.slice(before..after))))
}

The cursor is saved before consuming any input from the verbose_hex parser, so before parsing 0x, leading to a slice that include 0x, right?
(and same issue with then_ignore at the end of the pattern, or a whole delimited_by, ..)

If so, it kinda looks like a bug, do you think there's a another way to implement map_slice to work properly?
(without putting the map_slice in verbose_hex)

@zesterer
Copy link
Owner

zesterer commented Jan 4, 2023

map_slice (currently undocumented) seems to be one of the core combinator of the zero-copy system, I think it'll be important to emphasis it's importance! (and how it works)

Yep, definitely! It's the magic sauce that makes this enormous and soul-destroying refactor worth it 😁 I've yet to start on docs for the new stuff, but I'll be sure to put it front & centre!

Idea: make it the default for sliceable repeated combinator (and others) ?

This is probably a good idea. My reason for having repeated ignore the elements by default was because the alternative would be to allocate by collecting them into a container, but map_slice is effectively free (or, at least, the compiler is more than capable of noticing when the slice isn't used in order to elide any overhead that comes with this).

The cursor is saved before consuming any input from the verbose_hex parser, so before parsing 0x, leading to a slice that include 0x, right?

Yes, that is correct. You're right that it might be somewhat unintuitive at first, but this is already the behaviour of map_with_span: the span covers the entire parser's input, including the 0x, and I don't think there's a reliable or consistent way to avoid this being the case.

If you want to only see the slice of the input after the 0x, you can change the implementation as follows:

let hex_digits = one_of("0123456789ABCDEF").repeated().at_least(1).map_slice(|x| x);
let verbose_hex = just("0x").ignore_then(hex_digits);

// later
let only_hex_digits = verbose_hex;

One thing you have made me consider is whether map_slice should instead become map_with_slice, similar to map_with_span, such that you can get at the output of the inner parser. What do you think?

@bew
Copy link
Contributor Author

bew commented Jan 7, 2023

One thing you have made me consider is whether map_slice should instead become map_with_slice, similar to map_with_span, such that you can get at the output of the inner parser. What do you think?

Hmm but that would change it to an emitting parser, not just zero-copy checking, no?
(Maybe have both?)

@zesterer
Copy link
Owner

zesterer commented Jan 7, 2023

It would, yes. Maybe both are needed.

@CraftSpider
Copy link
Collaborator

Some thoughts on repeated and similar returning slice by default: it's possible to add a special default type (Say, TokenSlice), which doesn't implement Container and so has a unique implementation of Parser, which returns the slice instead of collecting. I've played around with implementing this, and it's not hard at all, though may be a little confusing, since the slice collection will ignore the return value of the wrapped parser.

@zesterer
Copy link
Owner

You have a point that this could produce confusing behaviour. I've also been more broadly concerned about this with repeated defaulting to collecting into ().

One alternative might be to have repeated infer the collection type, although it's not clear to me how well this would work in practice.

@CraftSpider
Copy link
Collaborator

One idea is to only make it work when the result of the inner parser is I::Token. On the other hand, I don't know a way to have that not make the trait errors for missing collect possibly worse than they already are. Alternatively, make it use inference, and add a method collect_slice when the output is tokens. Finally, do nothing, and just increase the visibility of map_slice

@zesterer
Copy link
Owner

This is now pretty much solved for the fact that collecting into a separate type is an explicit operation on the IterParser trait. Happy to re-open if it's believed there's still some ambiguity here.

@bew
Copy link
Contributor Author

bew commented Feb 21, 2023

Yay the IterParser looks really cool!

I think there might still be something to do / experiment with:

The cursor is saved before consuming any input from the verbose_hex parser, so before parsing 0x, leading to a slice that include 0x, right?

Yes, that is correct. You're right that it might be somewhat unintuitive at first, but this is already the behaviour of map_with_span: the span covers the entire parser's input, including the 0x, and I don't think there's a reliable or consistent way to avoid this being the case.

(writing plainly, as I'm not sure how to write it yet, I can try to expand on this later if needed)
With the new ability to pass state/context/configuration to parsers, wouldn't there be a way to pass some kind of reference to input offset, that can be moved appropriately when some parser is ignored ?
That would allow the initial saved offset to move forward if the first parser is actually ignored at some point.

What do you think?

@zesterer
Copy link
Owner

I'm still not quite sure what's being asked, sorry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants