Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid error when an input contains non-ASCII (1-byte) characters #11

Merged
merged 3 commits into from
Oct 6, 2022
Merged

Avoid error when an input contains non-ASCII (1-byte) characters #11

merged 3 commits into from
Oct 6, 2022

Conversation

rinarakaki
Copy link
Contributor

When an input contains non-ASCII (1-byte) characters, slicing &str by 1 violates the character boundaries and causes panic.

@wezm
Copy link
Owner

wezm commented Oct 5, 2022

Do you have a test case that causes a panic? The slice is guarded by the call to starts_with_bracket so it should never panic as we've checked that the first character is a valid single byte UTF-8 character. I.e. ( U+0028.

@rinarakaki
Copy link
Contributor Author

@wezm Please try to pass 'μ' as the input. Or any other greek letters ('π'). By the way, is there an option to convert only the Latin alphabet?

Copy link
Owner

@wezm wezm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try to pass 'μ' as the input. Or any other greek letters ('π')

Ahh that does panic. Thanks.

Would be great if you could add a test as well. There's a bunch at the bottom of lib.rs that should serve as an example.

src/lib.rs Outdated
@@ -117,7 +117,7 @@ fn process_word(word: &str) -> Cow<'_, str> {
if SMALL_RE.is_match(&lower_word) {
Cow::from(lower_word)
} else if starts_with_bracket(word) {
let rest = titlecase(&word[1..]);
let rest = titlecase(&word.chars().skip(1).collect::<String>());
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The slice in this case is guarded by starts_with_bracket so it can't cause a panic because we only get in here if the first character is (.

src/lib.rs Outdated
@@ -157,7 +157,7 @@ fn has_internal_caps(word: &str) -> bool {
}

fn has_internal_slashes(word: &str) -> bool {
!word.is_empty() && word[1..].contains('/')
!word.is_empty() && word.chars().skip(1).collect::<String>().contains('/')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is the source of the panic. There's no need to allocate a string though, we can iterate over the characters and check if one is a slash.

Something like:

fn has_internal_slashes(word: &str) -> bool {
    word.chars().skip(1).any(|chr| chr == '/')
}

@wezm
Copy link
Owner

wezm commented Oct 6, 2022

By the way, is there an option to convert only the Latin alphabet?

Do you mean something like passing in "μs" and have it come out as "us"? If so, that's probably out of scope for this library but there are others that could do it. I've used deunicode in another project.

@rinarakaki
Copy link
Contributor Author

@wezm I mean I want to leave small greek letters as are when they are used as special mathematical operators or constants:

  • π → π
  • λ-calculus → λ-Calculus

@wezm
Copy link
Owner

wezm commented Oct 6, 2022

@wezm I mean I want to leave small greek letters as are when they are used as special mathematical operators or constants:

* π → π

* λ-calculus → λ-Calculus

Ahh interesting. I see that it's upper-casing the greek letters too.

I can see why you'd want to avoid that. I think it would be fairly easy to support a mode where only ASCII characters were upper-cased.

@wezm wezm merged commit d1f0507 into wezm:master Oct 6, 2022
@rinarakaki
Copy link
Contributor Author

@wezm Thank you for reviewing and merging my request. Will look forward to the updated crate soon!

@wezm wezm mentioned this pull request Oct 6, 2022
@wezm
Copy link
Owner

wezm commented Oct 6, 2022

v2.2.1 has been published now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants