As suggested in #2285 (comment), the current GitHub version of mutate() and transmute() behaves strangely for column names in UTF-8 on Windows. Though I couldn't find the cause yet, I'm afraid this is a regression bug related to #1950.
Here are the error details with reprexes:
Case 1) Error with a data.frame that contains non-ASCII colnames
I found that mutate() adds a strange column for non-ASCII columns.
df1 <- data_frame(Φ = 1)
df1
#> # A tibble: 1 × 1
#> Φ
#> <dbl>
#> 1 1
df1_mutated <- df1 %>% mutate(Φ = Φ * 2)
df1_mutated
#> # A tibble: 1 × 2
#> Φ ホヲ
#> <dbl> <dbl>
#> 1 1 2
Case 2) Error with a data.frame that contains non-ASCII colnames in UTF-8
If the non-ASCII colname is UTF-8-encoded, mutate() does not add but replaces the existing columns with the strange column.
df2 <- df1
colnames(df2) <- enc2utf8(colnames(df2))
df2_mutated <- df2 %>% mutate(Φ = Φ * 2)
df2_mutated
#> # A tibble: 1 × 1
#> ホヲ
#> <dbl>
#> 1 2
Details
Then, what is this mysterious character ホヲ?
This is actually a UTF-8-converted Φ, but unfortunately it lost Encoding attribute. This is why the non-ASCII columns are mistakenly handled in non-UTF-8 environments.
charToRaw(colnames(df2_mutated))
#> [1] ce a6
charToRaw(enc2utf8("Φ"))
#> [1] ce a6
Encoding(colnames(df2_mutated))
#> [1] "unknown"
So if I set Encoding() as "UTF-8", it starts to work fine again.
colnames(df2_mutated) <- `Encoding<-`(colnames(df2_mutated), "UTF-8")
df2_mutated
#> # A tibble: 1 × 1
#> Φ
#> <dbl>
#> 1 2
My environment
library(dplyr)
packageVersion("dplyr")
#> [1] '0.5.0.9000'
# the installed version is at the point of this commit:
# https://github.com/hadley/dplyr/commit/8aa1bdb8fe95b741fb9411dbccd1b3af2f631dfc
packageDescription("dplyr")$GithubSHA1
#> [1] "8aa1bdb8fe95b741fb9411dbccd1b3af2f631dfc"
Sys.getlocale()
#> [1] "LC_COLLATE=Japanese_Japan.932;LC_CTYPE=Japanese_Japan.932;LC_MONETARY=Japanese_Japan.932;LC_NUMERIC=C;LC_TIME=Japanese_Japan.932"
As suggested in #2285 (comment), the current GitHub version of
mutate()andtransmute()behaves strangely for column names in UTF-8 on Windows. Though I couldn't find the cause yet, I'm afraid this is a regression bug related to #1950.Here are the error details with reprexes:
Case 1) Error with a data.frame that contains non-ASCII colnames
I found that
mutate()adds a strange column for non-ASCII columns.Case 2) Error with a data.frame that contains non-ASCII colnames in UTF-8
If the non-ASCII colname is UTF-8-encoded,
mutate()does not add but replaces the existing columns with the strange column.Details
Then, what is this mysterious character
ホヲ?This is actually a UTF-8-converted
Φ, but unfortunately it lost Encoding attribute. This is why the non-ASCII columns are mistakenly handled in non-UTF-8 environments.So if I set
Encoding()as"UTF-8", it starts to work fine again.My environment