New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forward-reverse grapheme mismatch on "\u{1F938}\u{1F3FE}\u{1F3FE}" #19

Manishearth opened this Issue Feb 23, 2017 · 1 comment


None yet
1 participant

Manishearth commented Feb 23, 2017

Found by cargo-fuzz.

This crashes:

extern crate unicode_segmentation;

use unicode_segmentation::UnicodeSegmentation;
fn main() {
    let s = "\u{1F938}\u{1F3FE}\u{1F3FE}";
    let forward = UnicodeSegmentation::graphemes(s, true).collect::<Vec<_>>();
    let forward_reversed = forward.into_iter().rev().collect::<Vec<_>>();
    let reverse = UnicodeSegmentation::graphemes(s, true).rev().collect::<Vec<_>>();
    assert_eq!(forward_reversed, reverse);

It panics with (manually escaped):

left `["\u{1F938}\u{1F3FE}\u{1F3FE}"]`, right: `["\u{1F3FE}", "\u{1F938}\u{1F3FE}"]`

The original emoji is (uniview) U+1F938 PERSON DOING CARTWHEEL followed by two Fitzpatrick skin color modifiers. I suspect this error will happen whenever you have two skin color modifiers after a modifiable emoji.

cc @mbrubeck @kwantam


This comment has been minimized.


Manishearth commented Feb 23, 2017

We have another mismatch on \u{200D}\u{200D}\u{2764}\u{2764}; the forward iterator considers it a single grapheme but the reverse iterator splits off the last heart into its own grapheme. This is likely a separate bug, but I'm listing it here.

raphlinus added a commit to raphlinus/unicode-segmentation that referenced this issue Mar 6, 2017

Additional test case
This adds a test case for unicode-rs#19 (which was a mismatch between forward
and reverse iterators in the original codebase).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment