Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic when index inside unicode sequence - Questions about scope of this crate #120

Closed
KMikeeU opened this issue Apr 10, 2023 · 1 comment

Comments

@KMikeeU
Copy link

KMikeeU commented Apr 10, 2023

I stumbled upon an issue, where calling functions like GraphemeCursor::is_boundary would panic whenever the index is in the middle of a unicode sequence (ie the index is not at a 'char' boundary). For example:

#[test]
fn test_grapheme_cursor_inside_unicode() {
    // 0xf0 0x9f 0x98 0x82
    let s = "😂";
    // Start the cursor at the second byte, which is inside the unicode sequence for the emoji
    let mut c = GraphemeCursor::new(1, s.len(), true);

    assert_eq!(
        c.is_boundary(s, 0),
        Ok(false)
    )
}

This panics at is_boundary:

---- grapheme::test_grapheme_cursor_inside_unicode stdout ----
thread 'grapheme::test_grapheme_cursor_inside_unicode' panicked at 'byte index 1 is not a char boundary; it is inside '😂' (bytes 0..4) of `😂`', src/grapheme.rs:553:22
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/core/src/panicking.rs:64:14
   2: core::str::slice_error_fail_rt
   3: core::str::slice_error_fail
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/core/src/str/mod.rs:86:9
   4: core::str::traits::<impl core::slice::index::SliceIndex<str> for core::ops::range::RangeFrom<usize>>::index
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/core/src/str/traits.rs:370:21
   5: core::str::traits::<impl core::ops::index::Index<I> for str>::index
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/core/src/str/traits.rs:65:9
   6: unicode_segmentation::grapheme::GraphemeCursor::is_boundary
             at ./src/grapheme.rs:553:22
   7: unicode_segmentation::grapheme::test_grapheme_cursor_inside_unicode
             at ./src/grapheme.rs:809:9
   8: unicode_segmentation::grapheme::test_grapheme_cursor_inside_unicode::{{closure}}
             at ./src/grapheme.rs:804:42
   9: core::ops::function::FnOnce::call_once
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/core/src/ops/function.rs:250:5
  10: core::ops::function::FnOnce::call_once
             at /rustc/9eb3afe9ebe9c7d2b84b71002d44f4a0edac95e0/library/core/src/ops/function.rs:250:5

Is this crate supposed to handle such issues? Or are they supposed to be prevented by checking these conditions before calling functions that can panic?

@Manishearth
Copy link
Member

Yeah, in the current implementation is panicky here. I think yes, you should check the conditions before calling these functions. We could probably in the long run make them more robus by making them fallible.

In general the cursor APIs have some bugs and need more polish, unfortunately. I haven't had the time to look into that.

@Manishearth Manishearth closed this as not planned Won't fix, can't repro, duplicate, stale Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants