Warning with the UCRT build on windows #1337

jimhester · 2021-11-29T15:00:49Z

Quitting from lines 141-166 (locales.Rmd)
Error: processing vignette 'locales.Rmd' failed with diagnostics:
translating strings with "bytes" encoding is not allowed

The text was updated successfully, but these errors were encountered:

DavisVaughan · 2021-11-29T23:38:12Z

e72a281

damianooldoni · 2022-02-18T18:36:00Z

I am not sure what I am writing now is related to this issue, but I run R on a Windows machine and while trying to reproduce the example in vignette, I get this:

library(stringi)
#> Warning: package 'stringi' was built under R version 4.1.2
x <- "Émigré cause célèbre déjà vu.\n"
y <- stri_conv(x, "UTF-8", "latin1")
#> Warning in stri_conv(x, "UTF-8", "latin1"): input data \xffffffc9 in the current
#> source encoding could not be converted to Unicode
#> Warning in stri_conv(x, "UTF-8", "latin1"): input data \xffffffe9 in the current
#> source encoding could not be converted to Unicode

#> Warning in stri_conv(x, "UTF-8", "latin1"): input data \xffffffe9 in the current
#> source encoding could not be converted to Unicode
#> Warning in stri_conv(x, "UTF-8", "latin1"): input data \xffffffe8 in the current
#> source encoding could not be converted to Unicode
#> Warning in stri_conv(x, "UTF-8", "latin1"): input data \xffffffe9 in the current
#> source encoding could not be converted to Unicode
#> Warning in stri_conv(x, "UTF-8", "latin1"): input data \xffffffe0 in the current
#> source encoding could not be converted to Unicode
#> Warning in stri_conv(x, "UTF-8", "latin1"): the Unicode code point \U0000fffd
#> cannot be converted to destination encoding

#> Warning in stri_conv(x, "UTF-8", "latin1"): the Unicode code point \U0000fffd
#> cannot be converted to destination encoding

#> Warning in stri_conv(x, "UTF-8", "latin1"): the Unicode code point \U0000fffd
#> cannot be converted to destination encoding

#> Warning in stri_conv(x, "UTF-8", "latin1"): the Unicode code point \U0000fffd
#> cannot be converted to destination encoding

#> Warning in stri_conv(x, "UTF-8", "latin1"): the Unicode code point \U0000fffd
#> cannot be converted to destination encoding

#> Warning in stri_conv(x, "UTF-8", "latin1"): the Unicode code point \U0000fffd
#> cannot be converted to destination encoding

# These strings look like they're identical:
x
#> [1] "Émigré cause célèbre déjà vu.\n"
y
#> [1] "\032migr\032 cause c\032l\032bre d\032j\032 vu.\n"
identical(x, y)
#> [1] FALSE

# But they have difference encodings:
Encoding(x)
#> [1] "latin1"
Encoding(y)
#> [1] "unknown"

Created on 2022-02-18 by the reprex package (v2.0.1)

I can reproduce the example if I replace the line y <- stri_conv(x, "UTF-8", "latin1") with y <- stri_conv(x, from = "latin1", to = "UTF-8").

Some details about my machine from sessionInfo():

R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=Dutch_Belgium.1252 
[2] LC_CTYPE=Dutch_Belgium.1252   
[3] LC_MONETARY=Dutch_Belgium.1252
[4] LC_NUMERIC=C                  
[5] LC_TIME=Dutch_Belgium.1252

hadley · 2023-07-31T22:56:54Z

I assume that this has been fixed.

sbearrows added the Windows 🪟 label Apr 7, 2022

hadley closed this as completed Jul 31, 2023

jennybc mentioned this issue Aug 1, 2023

Bring back the encoding example #1504

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warning with the UCRT build on windows #1337

Warning with the UCRT build on windows #1337

jimhester commented Nov 29, 2021

DavisVaughan commented Nov 29, 2021

damianooldoni commented Feb 18, 2022

hadley commented Jul 31, 2023

Warning with the UCRT build on windows #1337

Warning with the UCRT build on windows #1337

Comments

jimhester commented Nov 29, 2021

DavisVaughan commented Nov 29, 2021

damianooldoni commented Feb 18, 2022

hadley commented Jul 31, 2023