-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle Turkish dotted and dotless i properly #3011
Handle Turkish dotted and dotless i properly #3011
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach seems good. It would be nice to have a unit test, but I can't think of an obvious way to simulate it.
R/utilities.r
Outdated
# Use chartr() for safety since toupper() fails to convert i to I in Turkish locale | ||
lower_ascii <- "abcdefghijklmnopqrstuvwxyz" | ||
upper_ascii <- "ABCDEFGHIJKLMNOPQRSTUVWXYZ" | ||
tolower_ascii <- function(x) chartr(upper_ascii, lower_ascii, x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this to_lower_ascii()
and to_upper_ascii()
?
And I think we should wrap to_lower()
and to_upper()
to throw an error, to make sure that this error doesn't arise again in the future.
Thanks, I added some unit tests. Do you think it's overkill to install 5f1bb92 |
Ah, sorry, it seems I'm doing wrong about detecting the available locales... Let me fix.
|
Sorry my comment was supposed to reinforce not having a unit test. I don't think it gets us much here as it's likely to be fragile, and we've reduced the chance by overwriting |
R/utilities.r
Outdated
} | ||
|
||
toupper <- function(x) { | ||
stop('Please use `to_upper_ascii()`, which works fine in Turkish locale.', call. = FALSE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"in Turkish locale" -> "in all locales"
Oh, I got it wrong, sorry. Agreed. |
Thanks for confirmation, @cystein! |
* Use chartr() instad of toupper() and tolower()
* Use chartr() instad of toupper() and tolower()
* Use chartr() instad of toupper() and tolower()
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
Fix #3000
Turkish has two versions of i; dotted and dotless. Accordingly, the capital of
i
isİ
, notI
, which make ggplot2 to fail to findPositionIdentity
andStatIdentity
. The same thing can happen on dotlessı
andI
(c.f. http://www.i18nguy.com/unicode/turkish-i18n.html)This seems a well-known problem, and ICU has a flag for this, but unfortunately, there's no way to use this via R's
toupper()
.After googling around, I found several options to fix this:
chartr()
instead oftoupper()
andtolower()
(This PR)stringi::stri_trans_toupper()
andstringi::stri_trans_tolower()
, where we can specifylocale
.For 2., stringi is cool, but adding it to the dependency is a bit heavy. For 3., it seems dangerous to mess the user's locale. So, from these, I think option 1 is rather reasonable, although not very cool.
chartr()
is around 4x slower thantoupper()
, but I hope this is acceptable considering that people in Turkish locale probably almost cannot use ggplot2... (I'm not familiar with this problem, so I might be wrong.)