New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix glue encoding #81

Merged
merged 8 commits into from Feb 12, 2018

Conversation

Projects
None yet
2 participants
@dpprdan
Contributor

dpprdan commented Feb 9, 2018

glue() does not always return UTF-8 encoded strings

a <- "ä"
Encoding(glue("{a}"))
#> [1] "latin1"

"That is strange, because there is even a test for that", you might think. Well, that test does not actually test for identical encodings, because identical() and by extension expect_identical() do not care about encodings.

x <- "fa\xE7ile"
Encoding(x) <- "latin1"
identical(x, enc2utf8(x))
#> [1] TRUE

So I fixed the test by explicitly testing for Encoding() and added enc2utf8() to both glue() and collapse().

@dpprdan

This comment has been minimized.

Contributor

dpprdan commented Feb 9, 2018

from help(identical):
"Character strings are regarded as identical if they are in different marked encodings but would agree when translated to UTF-8."

FYI I stumbled upon this, while trying to construct a SQL query with glue_sql and RPostgreSQL complains when the query is not encoded as UTF-8.

@jimhester

This comment has been minimized.

Member

jimhester commented Feb 9, 2018

Thanks, you are correct. I think a slightly more robust way to handle this is to add enc2utf8() to as_glue.character

diff --git a/R/glue.R b/R/glue.R
index 472f079..f9ab287 100644
--- a/R/glue.R
+++ b/R/glue.R
@@ -227,7 +227,7 @@ as_glue.glue <- function(x, ...) {
 #' @export
 as_glue.character <- function(x, ...) {
   class(x) <- c("glue", "character")
-  x
+  enc2utf8(x)
 }

Also I would prefer leaving the current expectations as they are and adding an additional expectation explicitly testing for "UTF-8", e.g. expect_equal(Encoding(glue("{x}")), "UTF-8")

@dpprdan

This comment has been minimized.

Contributor

dpprdan commented Feb 10, 2018

Excellent ideas. Actually I had thought about as_glue but wasn't sure about side effects I might not be aware of.

A bit off-topic: When I run devtools::test() I get errors unless I do set the language to English, i.e. Sys.setenv(LANGUAGE = "en"). By default it is set to German, but then some messages are not what they are expected to be. Do you know if this is the intended behaviour (i.e. users should be aware of this) or should I file an issue?

@jimhester

This comment has been minimized.

Member

jimhester commented Feb 12, 2018

Thanks, looks good now!

CRAN always runs tests in English locale, and unfortunately there isn't a robust way to write non-brittle tests that respect the users' non-english locale.

@jimhester jimhester merged commit 3511d30 into tidyverse:master Feb 12, 2018

4 checks passed

codecov/patch No report found to compare against
Details
codecov/project No report found to compare against
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@dpprdan

This comment has been minimized.

Contributor

dpprdan commented Feb 12, 2018

👍
I assumed that English locale is the default. I was just wondering whether testthat should always Sys.setenv(LANGUAGE = "en") automatically.

@jimhester

This comment has been minimized.

Member

jimhester commented Feb 12, 2018

We experimented with that, but there are some cases where a given string is only translated once, even if you later switch encodings, so even doing that is not enough. R CMD check should always run code in the C locale IIRC, so doing that is one way to ensure the tests pass regardless of local language.

@dpprdan

This comment has been minimized.

Contributor

dpprdan commented Feb 12, 2018

Good to know, thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment