EDIT: Skip to my third post, everything else are bugs in base.
There is something wrong in the way readr's write_* and read_* functions deal with "special" characters (on Windows).
x <- c("€", "–", "¼", "⅛", "℅", "‰", "ö")
Encoding(x)
# "latin1" "latin1" "latin1" "UTF-8" "UTF-8" "latin1" "latin1"
df <- data.frame(x, stringsAsFactors = FALSE)
print(x)
# "€" "–" "¼" "⅛" "℅" "‰" "ö"
print(df)
# x
# 1 €
# 2 –
# 3 ¼
# 4 <U+215B>
# 5 <U+2105>
# 6 ‰
# 7 ö
So apparently print() cannot deal with ⅛ and ℅ when they are in a data.frame?
Anyway, this is what readr does
library("readr")
write_csv(df, "df_readr.csv")
read_csv("df_readr.csv")
# Parsed with column specification:
# cols(
# x = col_character()
# )
# # A tibble: 7 x 1
# x
# <chr>
# 1 "\u0080"
# 2 "\u0096"
# 3 ¼
# 4 <U+215B>
# 5 <U+2105>
# 6 "\u0089"
# 7 ö
Interesting. Even more so when we look at the output of write_lines()
write_lines(df$x, "df_readr_lines.txt")
read_lines("df_readr_lines.txt")
# "\u0080" "\u0096" "¼" "⅛" "℅" "\u0089" "ö"
This is equivalent to what write_lines() does and what I see in Notepad++ (well kinda).
So both write functions can deal with things like ⅛ and ℅ but not with the Euro-symbol or the en-dash (U+2013).
Finally, the output of format_csv is rubbish.
cat(format_csv(df))
# x
# €
# –
# ¼
# â…›
# â„…
# ‰
# ö
For comparison now utils
write.csv(df, "df_utils.csv", fileEncoding = "UTF-8", row.names = FALSE)
read.csv("df_utils.csv", fileEncoding = "UTF-8")
# x
# 1 €
# 2 –
# 3 ¼
# 4 <U+215B>
# 5 <U+2105>
# 6 ‰
# 7 ö
This looks the same in Notepad++.
So utils::write.csv() writes the same as what print(df) shows in the console. I'd expect this kind of consistency from readr, too.
Session info
devtools::session_info()
#> Session info -------------------------------------------------------------
#> setting value
#> version R version 3.4.1 (2017-06-30)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate German_Germany.1252
#> tz Europe/Berlin
#> date 2017-07-18
#> Packages -----------------------------------------------------------------
#> package * version date source
#> backports 1.1.0 2017-05-22 CRAN (R 3.4.0)
#> base * 3.4.1 2017-06-30 local
#> compiler 3.4.1 2017-06-30 local
#> datasets * 3.4.1 2017-06-30 local
#> devtools 1.13.2 2017-06-02 CRAN (R 3.4.0)
#> digest 0.6.12 2017-01-27 CRAN (R 3.3.2)
#> evaluate 0.10.1 2017-06-24 CRAN (R 3.4.0)
#> graphics * 3.4.1 2017-06-30 local
#> grDevices * 3.4.1 2017-06-30 local
#> hms 0.3 2016-11-22 CRAN (R 3.3.2)
#> htmltools 0.3.6 2017-04-28 CRAN (R 3.4.0)
#> knitr 1.16 2017-05-18 CRAN (R 3.4.0)
#> magrittr 1.5 2014-11-22 CRAN (R 3.3.0)
#> memoise 1.1.0 2017-05-29 Github (hadley/memoise@e372cde)
#> methods * 3.4.1 2017-06-30 local
#> R6 2.2.2 2017-06-17 CRAN (R 3.4.0)
#> Rcpp 0.12.12 2017-07-15 CRAN (R 3.4.1)
#> readr * 1.1.1.9000 2017-07-18 Github (tidyverse/readr@3ea8199)
#> rlang 0.1.1 2017-05-18 CRAN (R 3.4.0)
#> rmarkdown 1.6 2017-06-15 CRAN (R 3.4.0)
#> rprojroot 1.2 2017-01-16 CRAN (R 3.3.2)
#> stats * 3.4.1 2017-06-30 local
#> stringi 1.1.5 2017-04-07 CRAN (R 3.3.3)
#> stringr 1.2.0 2017-02-18 CRAN (R 3.3.3)
#> tibble 1.3.3 2017-05-28 CRAN (R 3.4.0)
#> tools 3.4.1 2017-06-30 local
#> utils * 3.4.1 2017-06-30 local
#> withr 1.0.2 2016-06-20 CRAN (R 3.3.1)
#> yaml 2.1.14 2016-11-12 CRAN (R 3.3.2)
EDIT: Skip to my third post, everything else are bugs in
base.There is something wrong in the way
readr'swrite_*andread_*functions deal with "special" characters (on Windows).So apparently
print()cannot deal with⅛and℅when they are in a data.frame?Anyway, this is what
readrdoesInteresting. Even more so when we look at the output of
write_lines()This is equivalent to what
write_lines()does and what I see in Notepad++ (well kinda).So both write functions can deal with things like
⅛and℅but not with the Euro-symbol or the en-dash (U+2013).Finally, the output of
format_csvis rubbish.For comparison now
utilsThis looks the same in Notepad++.
So
utils::write.csv()writes the same as whatprint(df)shows in the console. I'd expect this kind of consistency fromreadr, too.Session info