Skip to content

"Special" characters encoding issues with write_* and read_* #697

@dpprdan

Description

@dpprdan

EDIT: Skip to my third post, everything else are bugs in base.

There is something wrong in the way readr's write_* and read_* functions deal with "special" characters (on Windows).

x <- c("", "", "¼", "", "", "", "ö")
Encoding(x)

# "latin1" "latin1" "latin1" "UTF-8"  "UTF-8"  "latin1" "latin1"

df <- data.frame(x, stringsAsFactors = FALSE)

print(x)
# "€" "–" "¼" "⅛" "℅" "‰" "ö"

print(df)
#          x
# 1        €
# 2        –
# 3        ¼
# 4 <U+215B>
# 5 <U+2105>
# 6        ‰
# 7        ö

So apparently print() cannot deal with and when they are in a data.frame?

Anyway, this is what readr does

library("readr")
write_csv(df, "df_readr.csv")
read_csv("df_readr.csv")

# Parsed with column specification:
# cols(
#   x = col_character()
# )
# # A tibble: 7 x 1
#          x
#      <chr>
# 1 "\u0080"
# 2 "\u0096"
# 3        ¼
# 4 <U+215B>
# 5 <U+2105>
# 6 "\u0089"
# 7        ö

Interesting. Even more so when we look at the output of write_lines()

write_lines(df$x, "df_readr_lines.txt")
read_lines("df_readr_lines.txt")
# "\u0080" "\u0096" "¼"      "⅛"      "℅"      "\u0089" "ö"     

This is equivalent to what write_lines() does and what I see in Notepad++ (well kinda).

So both write functions can deal with things like and but not with the Euro-symbol or the en-dash (U+2013).

Finally, the output of format_csv is rubbish.

cat(format_csv(df))

# x
# €
# –
# ¼
# â…›
# â„…
# ‰
# ö

For comparison now utils

write.csv(df, "df_utils.csv", fileEncoding = "UTF-8", row.names = FALSE)
read.csv("df_utils.csv", fileEncoding = "UTF-8")
#          x
# 1        €
# 2        –
# 3        ¼
# 4 <U+215B>
# 5 <U+2105>
# 6        ‰
# 7        ö

This looks the same in Notepad++.

So utils::write.csv() writes the same as what print(df) shows in the console. I'd expect this kind of consistency from readr, too.

Session info
devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.4.1 (2017-06-30)
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  German_Germany.1252         
#>  tz       Europe/Berlin               
#>  date     2017-07-18
#> Packages -----------------------------------------------------------------
#>  package   * version    date       source                          
#>  backports   1.1.0      2017-05-22 CRAN (R 3.4.0)                  
#>  base      * 3.4.1      2017-06-30 local                           
#>  compiler    3.4.1      2017-06-30 local                           
#>  datasets  * 3.4.1      2017-06-30 local                           
#>  devtools    1.13.2     2017-06-02 CRAN (R 3.4.0)                  
#>  digest      0.6.12     2017-01-27 CRAN (R 3.3.2)                  
#>  evaluate    0.10.1     2017-06-24 CRAN (R 3.4.0)                  
#>  graphics  * 3.4.1      2017-06-30 local                           
#>  grDevices * 3.4.1      2017-06-30 local                           
#>  hms         0.3        2016-11-22 CRAN (R 3.3.2)                  
#>  htmltools   0.3.6      2017-04-28 CRAN (R 3.4.0)                  
#>  knitr       1.16       2017-05-18 CRAN (R 3.4.0)                  
#>  magrittr    1.5        2014-11-22 CRAN (R 3.3.0)                  
#>  memoise     1.1.0      2017-05-29 Github (hadley/memoise@e372cde) 
#>  methods   * 3.4.1      2017-06-30 local                           
#>  R6          2.2.2      2017-06-17 CRAN (R 3.4.0)                  
#>  Rcpp        0.12.12    2017-07-15 CRAN (R 3.4.1)                  
#>  readr     * 1.1.1.9000 2017-07-18 Github (tidyverse/readr@3ea8199)
#>  rlang       0.1.1      2017-05-18 CRAN (R 3.4.0)                  
#>  rmarkdown   1.6        2017-06-15 CRAN (R 3.4.0)                  
#>  rprojroot   1.2        2017-01-16 CRAN (R 3.3.2)                  
#>  stats     * 3.4.1      2017-06-30 local                           
#>  stringi     1.1.5      2017-04-07 CRAN (R 3.3.3)                  
#>  stringr     1.2.0      2017-02-18 CRAN (R 3.3.3)                  
#>  tibble      1.3.3      2017-05-28 CRAN (R 3.4.0)                  
#>  tools       3.4.1      2017-06-30 local                           
#>  utils     * 3.4.1      2017-06-30 local                           
#>  withr       1.0.2      2016-06-20 CRAN (R 3.3.1)                  
#>  yaml        2.1.14     2016-11-12 CRAN (R 3.3.2)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugan unexpected problem or unintended behaviorreprexneeds a minimal reproducible example

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions