Zero copy cleanup #190

Zij-IT · 2022-08-13T19:32:42Z

Decided since I am super excited for the Zero-Copy implementation that is up-and-coming, I would try an contribute a little back in the form of some small cleanup commits.

Most of these were just some bits that I found while perusing through the code, and thought they could be done a tad simpler. Let me know if I missed something, or if something needs changing!

Thanks for your time!

* Option magic ;)

…erator * This also exists on master

zesterer · 2022-08-13T21:04:30Z

Hey, this looks like great work! Is there any chance you could switch the target branch to zero-copy?

Zij-IT · 2022-08-13T22:03:48Z

Whoops! Totally thought I had it set to zero-copy. Its fixed!

zesterer · 2022-08-15T10:15:53Z

src/zero_copy/input.rs

-        if offset < self.len() {
-            // TODO: Can we `unwrap_unchecked` here?
-            let c = unsafe { self.get_unchecked(offset..).chars().next().unwrap() };
-            (offset + c.len_utf8(), Some(c))
-        } else {
-            (offset, None)
-        }
+        let chr = self.chars().skip(offset).next();
+        (offset + chr.map_or(0, char::len_utf8), chr)


I'd prefer if this change were not made because this is extremely hot code. Even single branches will have a substantial impact on parser performance (ideally. unwrap_unchecked would be used too: str is guaranteed to always be valid UTF-8, so provided we can guarantee that offset is valid, this should be fine).

With your comment in mind, I went ahead and ran both variations through Godbolt , and came to the conclusion that my suggestion had about 3x the amount of branches that yours did :D So, good call there. I was curious though what a hybrid variation would look like, and ended up doing this:

fn next(&self, offset: Self::Offset) -> (Self::Offset, Option<Self::Token>) { let chr = unsafe { self.get_unchecked(offset..).chars().next() }; (offset + chr.map_or(0, char::len_utf8), chr) }

The output of Godbolt can be found here, and shows that this variation has one less branch than the original implementation (even if the original is using unwrapped_unchecked). I am curious on if you would have any benchmarks that could be used to determine which of the two implementations proves faster!

That seems like it might be better in that case! The json benchmark is currently the canonical way to benchmark chumsky (although it would be nice to have others in the future given that it doesn't include cases like backtracking).

I just realized why it would have one less branch, and I think we both missed something when we thought we could merge it. I believe we have to revert the changes here.

Assuming the called passes in an offset that is too large, its no longer going to return none, but instead read memory that it shouldn't. I spaced that get_unchecked couldn't be used without a check, so that's my bad.

zesterer · 2022-08-15T10:36:38Z

Thanks for the PR! I'm broadly in favour of this, except the change I commented on. Thanks!

Zij-IT · 2022-08-15T14:44:50Z

I have added the hybrid variant in this commit! Let me know if any more changes are necessary!

zesterer · 2022-08-15T16:19:55Z

Thanks very much, I really appreciate you taking the time to work on chumsky! Hopefully this pushes zero-copy incrementally closer to being ready to replace master :)

Zij-IT · 2022-08-15T16:41:16Z

Happy to help! It has helped me to better understand how such parsers work, and was amazing to work with. If you need more helping hands in the future, don't be afraid to tag me :D

Zij-IT added 6 commits August 13, 2022 20:49

ZeroCopyCleanup: Simplify <str as Input>::next

78f7c99

ZeroCopyCleanup: Use Option::filter to simplify InputRef::skip_while

c5abf58

ZeroCopyCleanup: Simplify append_to for Option<T>

6052638

* Option magic ;)

ZeroCopyCleanup: Replace match with ?

d2bdda4

ZeroCopyCleanup: Call rfold instead of .rev().fold() on DoubleEndedIt…

58d3e0c

…erator * This also exists on master

ZeroCopyCleanup: Run cargo fmt

b192316

Zij-IT changed the base branch from master to zero-copy August 13, 2022 22:03

zesterer reviewed Aug 15, 2022

View reviewed changes

ZeroCopyClean: Add hybrid variant instead of previous

2f70742

zesterer merged commit 2f2a4d2 into zesterer:zero-copy Aug 15, 2022

Zij-IT deleted the zero-copy-cleanup branch August 15, 2022 16:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zero copy cleanup #190

Zero copy cleanup #190

Zij-IT commented Aug 13, 2022

zesterer commented Aug 13, 2022

Zij-IT commented Aug 13, 2022

zesterer Aug 15, 2022

Zij-IT Aug 15, 2022 •

edited

zesterer Aug 15, 2022

Zij-IT Aug 19, 2022

zesterer commented Aug 15, 2022

Zij-IT commented Aug 15, 2022

zesterer commented Aug 15, 2022

Zij-IT commented Aug 15, 2022

Zero copy cleanup #190

Zero copy cleanup #190

Conversation

Zij-IT commented Aug 13, 2022

zesterer commented Aug 13, 2022

Zij-IT commented Aug 13, 2022

zesterer Aug 15, 2022

Choose a reason for hiding this comment

Zij-IT Aug 15, 2022 • edited

Choose a reason for hiding this comment

zesterer Aug 15, 2022

Choose a reason for hiding this comment

Zij-IT Aug 19, 2022

Choose a reason for hiding this comment

zesterer commented Aug 15, 2022

Zij-IT commented Aug 15, 2022

zesterer commented Aug 15, 2022

Zij-IT commented Aug 15, 2022

Zij-IT Aug 15, 2022 •

edited