Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

write_excel_csv inserts 'U+FEFF' (Byte Order Mark) when append = TRUE #1075

Closed
orderlyquant opened this issue Feb 7, 2020 · 4 comments
Closed

Comments

@orderlyquant
Copy link

orderlyquant commented Feb 7, 2020

write_excel_csv inserts 'U+FEFF', Byte Order Mark (BOM) when append = TRUE, write_csv doesn't. I believe it would be preferable if neither did.

From the docs:

All columns are encoded as UTF-8. write_excel_csv() and write_excel_csv2() also include a UTF-8 Byte order mark which indicates to Excel the csv is UTF-8 encoded.

However, write_excel_csv doesn't include the BOM if there is no "append"ing.

I discovered this problem because a vlookup in Excel wasn't matching the values in the first row of each appended section.

I've included session info, but can confirm that this behavior is the same on Windows and Mac.

Note: reprex creates 2 csv files that are necessary to demonstrate problem.

library(reprex)
library(tidyverse)

simple_tbl <- tibble(
  Region = c("West", "East", "North", "South"),
  Sales = c(50, 100, 73, 28)
)

simple_write <- function(rgn, sales, out_file, use_excel_csv) {
  
  out <- tibble(
    region_summary = paste0("Sales for the ", rgn, " region: ", sales)
  )
  
  append_flag <- file.exists( out_file )
  
  if (use_excel_csv) {
    write_excel_csv(
      out,
      out_file,
      append = append_flag,
      col_names = !append_flag
    )
  } else {
    write_csv(
      out,
      out_file,
      append = append_flag,
      col_names = !append_flag
    )
  }
}

# sales_excel.csv will have U+FEFF (Byte Order Mark)
# at start of each "append"
simple_tbl %>%
  select(Region, Sales) %>%
  pwalk(~simple_write(..1, ..2, "sales_excel.csv", TRUE))

# sales_no_excel.csv will not have any U+FEFF
simple_tbl %>%
  select(Region, Sales) %>%
  pwalk(~simple_write(..1, ..2, "sales_no_excel.csv", FALSE))

Created on 2020-02-07 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 3.6.1 (2019-07-05)
#>  os       macOS Catalina 10.15.3      
#>  system   x86_64, darwin15.6.0        
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       America/New_York            
#>  date     2020-02-07                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version     date       lib source                              
#>  assertthat    0.2.1       2019-03-21 [1] CRAN (R 3.6.0)                      
#>  backports     1.1.5       2019-10-02 [1] CRAN (R 3.6.0)                      
#>  broom         0.5.4       2020-01-27 [1] CRAN (R 3.6.1)                      
#>  callr         3.4.1       2020-01-24 [1] CRAN (R 3.6.0)                      
#>  cellranger    1.1.0       2016-07-27 [1] CRAN (R 3.6.0)                      
#>  cli           2.0.1.9000  2020-02-07 [1] Github (r-lib/cli@e9f041e)          
#>  colorspace    1.4-1       2019-03-18 [1] CRAN (R 3.6.0)                      
#>  crayon        1.3.4       2017-09-16 [1] CRAN (R 3.6.0)                      
#>  DBI           1.1.0       2019-12-15 [1] CRAN (R 3.6.0)                      
#>  dbplyr        1.4.2       2019-06-17 [1] CRAN (R 3.6.0)                      
#>  desc          1.2.0       2018-05-01 [1] CRAN (R 3.6.0)                      
#>  devtools      2.2.1       2019-09-24 [1] CRAN (R 3.6.0)                      
#>  digest        0.6.23      2019-11-23 [1] CRAN (R 3.6.0)                      
#>  dplyr       * 0.8.4       2020-01-31 [1] CRAN (R 3.6.0)                      
#>  ellipsis      0.3.0       2019-09-20 [1] CRAN (R 3.6.0)                      
#>  evaluate      0.14        2019-05-28 [1] CRAN (R 3.6.0)                      
#>  fansi         0.4.1       2020-01-08 [1] CRAN (R 3.6.0)                      
#>  forcats     * 0.4.0       2019-02-17 [1] CRAN (R 3.6.0)                      
#>  fs            1.3.1       2019-05-06 [1] CRAN (R 3.6.0)                      
#>  generics      0.0.2       2018-11-29 [1] CRAN (R 3.6.0)                      
#>  ggplot2     * 3.2.1       2019-08-10 [1] CRAN (R 3.6.0)                      
#>  glue          1.3.1       2019-03-12 [1] CRAN (R 3.6.0)                      
#>  gtable        0.3.0       2019-03-25 [1] CRAN (R 3.6.0)                      
#>  haven         2.2.0       2019-11-08 [1] CRAN (R 3.6.0)                      
#>  highr         0.8         2019-03-20 [1] CRAN (R 3.6.0)                      
#>  hms           0.5.3       2020-01-08 [1] CRAN (R 3.6.0)                      
#>  htmltools     0.4.0       2019-10-04 [1] CRAN (R 3.6.0)                      
#>  httr          1.4.1       2019-08-05 [1] CRAN (R 3.6.0)                      
#>  jsonlite      1.6.1       2020-02-02 [1] CRAN (R 3.6.0)                      
#>  knitr         1.28        2020-02-06 [1] CRAN (R 3.6.1)                      
#>  lattice       0.20-38     2018-11-04 [1] CRAN (R 3.6.1)                      
#>  lazyeval      0.2.2       2019-03-15 [1] CRAN (R 3.6.0)                      
#>  lifecycle     0.1.0       2019-08-01 [1] CRAN (R 3.6.0)                      
#>  lubridate     1.7.4       2018-04-11 [1] CRAN (R 3.6.0)                      
#>  magrittr      1.5         2014-11-22 [1] CRAN (R 3.6.0)                      
#>  memoise       1.1.0       2017-04-21 [1] CRAN (R 3.6.0)                      
#>  modelr        0.1.5       2019-08-08 [1] CRAN (R 3.6.0)                      
#>  munsell       0.5.0       2018-06-12 [1] CRAN (R 3.6.0)                      
#>  nlme          3.1-140     2019-05-12 [1] CRAN (R 3.6.1)                      
#>  pillar        1.4.3       2019-12-20 [1] CRAN (R 3.6.0)                      
#>  pkgbuild      1.0.6       2019-10-09 [1] CRAN (R 3.6.0)                      
#>  pkgconfig     2.0.3       2019-09-22 [1] CRAN (R 3.6.0)                      
#>  pkgload       1.0.2       2018-10-29 [1] CRAN (R 3.6.0)                      
#>  prettyunits   1.1.1       2020-01-24 [1] CRAN (R 3.6.0)                      
#>  processx      3.4.1       2019-07-18 [1] CRAN (R 3.6.0)                      
#>  ps            1.3.0       2018-12-21 [1] CRAN (R 3.6.0)                      
#>  purrr       * 0.3.3       2019-10-18 [1] CRAN (R 3.6.0)                      
#>  R6            2.4.1       2019-11-12 [1] CRAN (R 3.6.0)                      
#>  Rcpp          1.0.3       2019-11-08 [1] CRAN (R 3.6.0)                      
#>  readr       * 1.3.1       2018-12-21 [1] CRAN (R 3.6.0)                      
#>  readxl        1.3.1       2019-03-13 [1] CRAN (R 3.6.0)                      
#>  remotes       2.1.0       2019-06-24 [1] CRAN (R 3.6.0)                      
#>  reprex      * 0.3.0       2019-05-16 [1] CRAN (R 3.6.0)                      
#>  rlang         0.4.4       2020-01-28 [1] CRAN (R 3.6.1)                      
#>  rmarkdown     2.1.1       2020-02-07 [1] Github (cpsievert/rmarkdown@d587b7b)
#>  rprojroot     1.3-2       2018-01-03 [1] CRAN (R 3.6.0)                      
#>  rvest         0.3.5       2019-11-08 [1] CRAN (R 3.6.0)                      
#>  scales        1.1.0       2019-11-18 [1] CRAN (R 3.6.0)                      
#>  sessioninfo   1.1.1       2018-11-05 [1] CRAN (R 3.6.0)                      
#>  stringi       1.4.5       2020-01-11 [1] CRAN (R 3.6.0)                      
#>  stringr     * 1.4.0       2019-02-10 [1] CRAN (R 3.6.0)                      
#>  testthat      2.3.1       2019-12-01 [1] CRAN (R 3.6.0)                      
#>  tibble      * 2.1.3       2019-06-06 [1] CRAN (R 3.6.0)                      
#>  tidyr       * 1.0.2       2020-01-24 [1] CRAN (R 3.6.0)                      
#>  tidyselect    1.0.0       2020-01-27 [1] CRAN (R 3.6.1)                      
#>  tidyverse   * 1.3.0       2019-11-21 [1] CRAN (R 3.6.0)                      
#>  usethis       1.5.1       2019-07-04 [1] CRAN (R 3.6.0)                      
#>  vctrs         0.2.99.9005 2020-02-07 [1] Github (r-lib/vctrs@0fc9acd)        
#>  withr         2.1.2       2018-03-15 [1] CRAN (R 3.6.0)                      
#>  xfun          0.12        2020-01-13 [1] CRAN (R 3.6.0)                      
#>  xml2          1.2.2       2019-08-09 [1] CRAN (R 3.6.0)                      
#>  yaml          2.2.1       2020-02-01 [1] CRAN (R 3.6.0)                      
#> 
#> [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
@jimhester
Copy link
Member

jimhester commented Feb 7, 2020

A minimal reprex is

library(readr)
write_excel_csv(data.frame(a = 1), "out.csv")
write_excel_csv(data.frame(a = 2), "out.csv", append = TRUE)
readBin("out.csv", "raw", file.size("out.csv"))
#>  [1] ef bb bf 61 0a 31 0a ef bb bf 32 0a

Created on 2020-02-07 by the reprex package (v0.3.0)

@orderlyquant
Copy link
Author

orderlyquant commented Feb 7, 2020

Thanks, better example. But why close? Any comments on the underlying issue? If the intent is to create files for Excel to read, then the current functionality will create data that looks ok, but will not be treated as one would expect in Excel.

@jimhester
Copy link
Member

jimhester commented Feb 7, 2020

Because it is now fixed as of 3880d8c

@orderlyquant
Copy link
Author

orderlyquant commented Feb 7, 2020

Great. Thanks, just getting used to the flow of how issues are tracked and closed. Will click on the "closed in" link in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants