-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mutate()/transmute() fails to handle column names in UTF-8 on Windows #2387
Comments
Thanks. Could you please double-check that this is indeed a regression, perhaps with dplyr 0.5.0 and/or an older version 0d57562: remotes::install_github("hadley/devtools@0d57562") |
@krlmlr Thanks for your quick response! I will. |
Here is the result: https://gist.github.com/yutannihilation/e458b0fcf6ef784260e9d5558fa3ec68
So, it doesn't seem that this behavior is introduced by the recent changes in this few days. |
Thanks. The version I suggested to test doesn't contain the most recent encoding-related changes. I'll make sure to test this particular example when fixing #1950. |
Confirmed that b205d1f won't work. https://gist.github.com/yutannihilation/e458b0fcf6ef784260e9d5558fa3ec68#file-dplyr_test4-md |
I guess you've already found the cause, but let me write my expectation. Since the mutate_.tbl_df <- function(.data, ..., .dots) {
dots <- lazyeval::all_dots(.dots, ..., all_named = TRUE)
mutate_impl(.data, dots)
} Therefore, the Symbol name = lazy.name(); But, later, it is set as accumulator.set(name, results[i]); |
Thanks for confirming this. I've started to replace all uses of |
Thanks for your information!
I got it. Now I understood your comment on #1950 a bit clearer. I will keep watching #2388 and am always happy to help you test the change with my Windows :) |
Oh, I was referring to a C++ class. At the R level we might use a simple wrapper around |
Anyway, thanks a lot for your patience and for testing! |
Ah, sorry, I was confused with C++ level and R level... I see. No problem!👍 |
Confirmed this has been fixed. Kudos to you for this great job! Thanks so much!!! 🍣 🍣 🍣 |
As suggested in #2285 (comment), the current GitHub version of
mutate()
andtransmute()
behaves strangely for column names in UTF-8 on Windows. Though I couldn't find the cause yet, I'm afraid this is a regression bug related to #1950.Here are the error details with reprexes:
Case 1) Error with a data.frame that contains non-ASCII colnames
I found that
mutate()
adds a strange column for non-ASCII columns.Case 2) Error with a data.frame that contains non-ASCII colnames in UTF-8
If the non-ASCII colname is UTF-8-encoded,
mutate()
does not add but replaces the existing columns with the strange column.Details
Then, what is this mysterious character
ホヲ
?This is actually a UTF-8-converted
Φ
, but unfortunately it lost Encoding attribute. This is why the non-ASCII columns are mistakenly handled in non-UTF-8 environments.So if I set
Encoding()
as"UTF-8"
, it starts to work fine again.My environment
The text was updated successfully, but these errors were encountered: