Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning with the UCRT build on windows #1337

Closed
jimhester opened this issue Nov 29, 2021 · 3 comments
Closed

Warning with the UCRT build on windows #1337

jimhester opened this issue Nov 29, 2021 · 3 comments

Comments

@jimhester
Copy link
Collaborator

Quitting from lines 141-166 (locales.Rmd)
Error: processing vignette 'locales.Rmd' failed with diagnostics:
translating strings with "bytes" encoding is not allowed
@DavisVaughan
Copy link
Member

e72a281

@damianooldoni
Copy link

I am not sure what I am writing now is related to this issue, but I run R on a Windows machine and while trying to reproduce the example in vignette, I get this:

library(stringi)
#> Warning: package 'stringi' was built under R version 4.1.2
x <- "Émigré cause célèbre déjà vu.\n"
y <- stri_conv(x, "UTF-8", "latin1")
#> Warning in stri_conv(x, "UTF-8", "latin1"): input data \xffffffc9 in the current
#> source encoding could not be converted to Unicode
#> Warning in stri_conv(x, "UTF-8", "latin1"): input data \xffffffe9 in the current
#> source encoding could not be converted to Unicode

#> Warning in stri_conv(x, "UTF-8", "latin1"): input data \xffffffe9 in the current
#> source encoding could not be converted to Unicode
#> Warning in stri_conv(x, "UTF-8", "latin1"): input data \xffffffe8 in the current
#> source encoding could not be converted to Unicode
#> Warning in stri_conv(x, "UTF-8", "latin1"): input data \xffffffe9 in the current
#> source encoding could not be converted to Unicode
#> Warning in stri_conv(x, "UTF-8", "latin1"): input data \xffffffe0 in the current
#> source encoding could not be converted to Unicode
#> Warning in stri_conv(x, "UTF-8", "latin1"): the Unicode code point \U0000fffd
#> cannot be converted to destination encoding

#> Warning in stri_conv(x, "UTF-8", "latin1"): the Unicode code point \U0000fffd
#> cannot be converted to destination encoding

#> Warning in stri_conv(x, "UTF-8", "latin1"): the Unicode code point \U0000fffd
#> cannot be converted to destination encoding

#> Warning in stri_conv(x, "UTF-8", "latin1"): the Unicode code point \U0000fffd
#> cannot be converted to destination encoding

#> Warning in stri_conv(x, "UTF-8", "latin1"): the Unicode code point \U0000fffd
#> cannot be converted to destination encoding

#> Warning in stri_conv(x, "UTF-8", "latin1"): the Unicode code point \U0000fffd
#> cannot be converted to destination encoding

# These strings look like they're identical:
x
#> [1] "Émigré cause célèbre déjà vu.\n"
y
#> [1] "\032migr\032 cause c\032l\032bre d\032j\032 vu.\n"
identical(x, y)
#> [1] FALSE

# But they have difference encodings:
Encoding(x)
#> [1] "latin1"
Encoding(y)
#> [1] "unknown"

Created on 2022-02-18 by the reprex package (v2.0.1)

I can reproduce the example if I replace the line y <- stri_conv(x, "UTF-8", "latin1") with y <- stri_conv(x, from = "latin1", to = "UTF-8").

Some details about my machine from sessionInfo():

R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=Dutch_Belgium.1252 
[2] LC_CTYPE=Dutch_Belgium.1252   
[3] LC_MONETARY=Dutch_Belgium.1252
[4] LC_NUMERIC=C                  
[5] LC_TIME=Dutch_Belgium.1252

@hadley
Copy link
Member

hadley commented Jul 31, 2023

I assume that this has been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants