Encoding issue: the results with non-ASCII symbols were not reproduced #197

GegznaV · 2018-07-14T14:33:03Z

I use RStudio on Windows 10. I selected these lines from my .Rmd file:

Sys.setlocale(locale = "Lithuanian")
df <- data.frame(x = 1:5, y = c("Ą", "Č", "Ę", "ū", "ž"))

Sys.setlocale(locale = "Chinese")
capture.output(skimr::skim(df))

Called the reprex RStudio add-in (for selection of R code to be printed on GitHub) and got this result (pay attention to non-ASCII symbols Ą, etc.):

Sys.setlocale(locale = "Lithuanian")
#> [1] "LC_COLLATE=Lithuanian_Lithuania.1257;LC_CTYPE=Lithuanian_Lithuania.1257;LC_MONETARY=Lithuanian_Lithuania.1257;LC_NUMERIC=C;LC_TIME=Lithuanian_Lithuania.1257"
df <- data.frame(x = 1:5, y = c("<U+0104>", "<U+010C>", "<U+0118>", "¨±", "<U+017E>"))

Sys.setlocale(locale = "Chinese")
#> [1] "LC_COLLATE=Chinese (Simplified)_China.936;LC_CTYPE=Chinese (Simplified)_China.936;LC_MONETARY=Chinese (Simplified)_China.936;LC_NUMERIC=C;LC_TIME=Chinese (Simplified)_China.936"
capture.output(skimr::skim(df))
#>  [1] "Skim summary statistics"                                                              
#>  [2] " n obs: 5 "                                                                           
#>  [3] " n variables: 2 "                                                                     
#>  [4] ""                                                                                     
#>  [5] "-- Variable type:factor -------------------------------------------------------------"
#>  [6] " variable missing complete n n_unique                     top_counts"                 
#>  [7] "        y       0        5 5        5 <U+: 1, <U+: 1, <U+: 1, <U+: 1"                 
#>  [8] " ordered"                                                                             
#>  [9] "   FALSE"                                                                             
#> [10] ""                                                                                     
#> [11] "-- Variable type:integer ------------------------------------------------------------"
#> [12] " variable missing complete n mean   sd p0 p25 p50 p75 p100     hist"                  
#> [13] "        x       0        5 5    3 1.58  1   2   3   4    5 <U+00D8>~<U+00D8>~<U+00D8>x<U+00D8>~<U+00D8>x<U+00D8>~<U+00D8>x<U+00D8>~"

Created on 2018-07-14 by the reprex package (v0.2.0).

The result had to be an error:

[1] "LC_COLLATE=Lithuanian_Lithuania.1257;LC_CTYPE=Lithuanian_Lithuania.1257;LC_MONETARY=Lithuanian_Lithuania.1257;LC_NUMERIC=C;LC_TIME=Lithuanian_Lithuania.1257"
[1] "LC_COLLATE=Chinese (Simplified)_China.936;LC_CTYPE=Chinese (Simplified)_China.936;LC_MONETARY=Chinese (Simplified)_China.936;LC_NUMERIC=C;LC_TIME=Chinese (Simplified)_China.936"
 Show Traceback
Error in substr(names(x), 1, options$formats$.levels$max_char) : invalid multibyte string at '<c0>'

This behavior may be related to this issue.

The text was updated successfully, but these errors were encountered:

batpigandme · 2018-07-14T14:44:31Z

Possibly related? #82

yutannihilation · 2018-07-15T01:50:55Z

This is the issue laid in R core and is hard to fix on the packages' side...😢

r-lib/evaluate#59

GegznaV · 2018-07-15T19:28:14Z

Are there any chances that the issue will be solved by the R Core Team?

isteves · 2018-07-27T15:56:53Z

Related issue (with a Hebrew example): tidyverse/tibble#433

dpprdan · 2018-11-14T19:47:57Z

When changing the locale with Sys.setlocale(), the last Sys.setlocale() call seems to determine the locale for the whole reprex session, even when it is called last.
First with only one call to Sys.setlocale()

Sys.setlocale(locale = "English")
#> [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
print("ö")
#> [1] "ö"

Now with a second call (I “reprexed” the part above and then everything from here on down separately. )

Sys.setlocale(locale = "English")
#> [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
print("v")
#> [1] "<f6>"
Sys.setlocale(locale = "C")
#> [1] "C"

I think is what causes @GegznaV’s problem. I am not sure, though, whether this is really related to the other issues mentioned above.

BTW, “ö” should not get converted to “v” but to “U+00F6”, IMO, but that might be a bug in base R.

jennybc · 2019-05-18T21:12:44Z

I think between:

use of knitr version 1.23
Use read_utf8() and write_utf8() #237
Specify UTF-8 encoding to rmarkdown::render() #261

reprex is handling encoding as well as its dependencies allow (mostly especially the difficulties around encoding on Windows in R itself). I'm closing this. If anyone has a new challenging example, especially one that fails with dev reprex + knitr v1.23, please add it to #262.

I don't consider these reprexes that change the locale mid-reprex to be fair game, unless one can show that rmarkdown/knitr handles that code correctly, but reprex does not.

jennybc added the encoding � label May 15, 2019

jennybc closed this as completed May 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding issue: the results with non-ASCII symbols were not reproduced #197

Encoding issue: the results with non-ASCII symbols were not reproduced #197

GegznaV commented Jul 14, 2018

batpigandme commented Jul 14, 2018

yutannihilation commented Jul 15, 2018

GegznaV commented Jul 15, 2018

isteves commented Jul 27, 2018

dpprdan commented Nov 14, 2018

jennybc commented May 18, 2019 •

edited

Loading

Encoding issue: the results with non-ASCII symbols were not reproduced #197

Encoding issue: the results with non-ASCII symbols were not reproduced #197

Comments

GegznaV commented Jul 14, 2018

batpigandme commented Jul 14, 2018

yutannihilation commented Jul 15, 2018

GegznaV commented Jul 15, 2018

isteves commented Jul 27, 2018

dpprdan commented Nov 14, 2018

jennybc commented May 18, 2019 • edited Loading

jennybc commented May 18, 2019 •

edited

Loading