Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zero copy cleanup #190

Merged
merged 7 commits into from Aug 15, 2022
Merged

Zero copy cleanup #190

merged 7 commits into from Aug 15, 2022

Conversation

Zij-IT
Copy link
Contributor

@Zij-IT Zij-IT commented Aug 13, 2022

Decided since I am super excited for the Zero-Copy implementation that is up-and-coming, I would try an contribute a little back in the form of some small cleanup commits.

Most of these were just some bits that I found while perusing through the code, and thought they could be done a tad simpler. Let me know if I missed something, or if something needs changing!

Thanks for your time!

@zesterer
Copy link
Owner

Hey, this looks like great work! Is there any chance you could switch the target branch to zero-copy?

@Zij-IT Zij-IT changed the base branch from master to zero-copy August 13, 2022 22:03
@Zij-IT
Copy link
Contributor Author

Zij-IT commented Aug 13, 2022

Whoops! Totally thought I had it set to zero-copy. Its fixed!

Comment on lines 38 to 39
if offset < self.len() {
// TODO: Can we `unwrap_unchecked` here?
let c = unsafe { self.get_unchecked(offset..).chars().next().unwrap() };
(offset + c.len_utf8(), Some(c))
} else {
(offset, None)
}
let chr = self.chars().skip(offset).next();
(offset + chr.map_or(0, char::len_utf8), chr)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if this change were not made because this is extremely hot code. Even single branches will have a substantial impact on parser performance (ideally. unwrap_unchecked would be used too: str is guaranteed to always be valid UTF-8, so provided we can guarantee that offset is valid, this should be fine).

Copy link
Contributor Author

@Zij-IT Zij-IT Aug 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With your comment in mind, I went ahead and ran both variations through Godbolt , and came to the conclusion that my suggestion had about 3x the amount of branches that yours did :D So, good call there. I was curious though what a hybrid variation would look like, and ended up doing this:

fn next(&self, offset: Self::Offset) -> (Self::Offset, Option<Self::Token>) {
    let chr = unsafe { self.get_unchecked(offset..).chars().next() };
    (offset + chr.map_or(0, char::len_utf8), chr)
}

The output of Godbolt can be found here, and shows that this variation has one less branch than the original implementation (even if the original is using unwrapped_unchecked). I am curious on if you would have any benchmarks that could be used to determine which of the two implementations proves faster!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like it might be better in that case! The json benchmark is currently the canonical way to benchmark chumsky (although it would be nice to have others in the future given that it doesn't include cases like backtracking).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized why it would have one less branch, and I think we both missed something when we thought we could merge it. I believe we have to revert the changes here.

Assuming the called passes in an offset that is too large, its no longer going to return none, but instead read memory that it shouldn't. I spaced that get_unchecked couldn't be used without a check, so that's my bad.

@zesterer
Copy link
Owner

Thanks for the PR! I'm broadly in favour of this, except the change I commented on. Thanks!

@Zij-IT
Copy link
Contributor Author

Zij-IT commented Aug 15, 2022

I have added the hybrid variant in this commit! Let me know if any more changes are necessary!

@zesterer
Copy link
Owner

Thanks very much, I really appreciate you taking the time to work on chumsky! Hopefully this pushes zero-copy incrementally closer to being ready to replace master :)

@zesterer zesterer merged commit 2f2a4d2 into zesterer:zero-copy Aug 15, 2022
@Zij-IT
Copy link
Contributor Author

Zij-IT commented Aug 15, 2022

Happy to help! It has helped me to better understand how such parsers work, and was amazing to work with. If you need more helping hands in the future, don't be afraid to tag me :D

@Zij-IT Zij-IT deleted the zero-copy-cleanup branch August 15, 2022 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants