New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unwanted *printed* output from read_excel #82

Closed
felasa opened this Issue Apr 17, 2015 · 11 comments

Comments

Projects
None yet
8 participants
@felasa

felasa commented Apr 17, 2015

read_excel prints lines unrelated to the output:

file to use for example (tst_case.xls):

https://drive.google.com/uc?export=download&id=0B_h1BO3hBg0gX0lhSU5SQXBvd1E:

code:

> foo <- read_excel("./tst_case.xls")
DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 00 00 a8 00 00 00 06 00 
DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 00 00 a8 00 00 00 06 00 
DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 00 00 a8 00 00 00 06 00 
DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 00 00 a8 00 00 00 06 00 
@felasa

This comment has been minimized.

felasa commented Apr 21, 2015

Just noticed that if I save the file as xlsx then the problem goes away.

@eibanez

This comment has been minimized.

eibanez commented Apr 23, 2015

I think this has to do with the way libxls deals with named ranges (#79)

For future reference, the printf statement that creates this output is in the function xls_parseWorkBook in xls.c, under the line that contains XLS_RECORD_DEFINEDNAME

@almartin82

This comment has been minimized.

almartin82 commented Apr 23, 2015

Hi! Found another example of this.

name <- tempfile(pattern = "enr", tmpdir = tempdir(), fileext = ".zip")
tdir <- tempdir()

downloader::download("http://www.nj.gov/education/data/enr/enr11/enr.zip", dest=tname, mode="wb") 

utils::unzip(tname, exdir = tdir)

#read excel
enr_files <- utils::unzip(tname, exdir = ".", list = TRUE)
enr <- readxl::read_excel(paste0(tdir,'\\',enr_files$Name[1]))

outputs:

DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 00 00 23 65 00 00 19 00 
DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 00 00 23 65 00 00 19 00 
DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 00 00 23 65 00 00 19 00 
DEFINEDNAME: 21 00 00 01 0b 00 00 00 01 00 00 00 00 00 00 0d 3b 00 00 00 00 23 65 00 00 19 00 

Excel name manager does not report and defined names or ranges in that file name - for whatever that's worth!

Super helpful package. Thanks for all your work on this--

@eibanez

This comment has been minimized.

eibanez commented Apr 29, 2015

It looks like it could be a named range or a global function.

eibanez added a commit to eibanez/readxl that referenced this issue May 5, 2015

@krlmlr

This comment has been minimized.

Member

krlmlr commented May 8, 2015

I have filed an issue with libxls: https://sourceforge.net/p/libxls/bugs/27/

@eibanez

This comment has been minimized.

eibanez commented May 8, 2015

I already filed one and implemented it GitHub. They are debating whether to include my changes upstream.

@kendonB

This comment has been minimized.

kendonB commented Oct 23, 2015

+1 for addressing this

@sheriferson

This comment has been minimized.

sheriferson commented Dec 23, 2015

I am still running into this issue with readxl version 0.1.0

I can't find a way to suppress the DEFINEDNAME output without also capturing the output I actually care about (the contents of the Excel sheet).

Is there a temporary workaround that we can implement in our R code, or do we have to wait for an update to libxls?

Update with workaround:
I thought I would contribute my temporary workaround. It's nothing genius, just wrapping any call to a function in readxl in capture.output(..., file = "/dev/null"). If you're on Windows you need to use file = 'NUL'.

e.g. a function read_all_sheets that reads all sheets in an Excel file and returns a list of dataframes.

if (require(readxl)) {
  read_all_sheets <- function(filename, rowsToSkip = 0) {
    capture.output(sheets <- excel_sheets(filename),
                   file = "/dev/null")

    capture.output(  # I need to do this because of the crappy `DEFINEDNAME` output
      x <- lapply(sheets, function(y) read_excel(filename, sheet = y,
                                               skip = rowsToSkip)),

      file = "/dev/null")
    names(x) <- sheets
    return(x)
  }
}

This is a modification on this Stack Overflow answer.

@krlmlr

This comment has been minimized.

Member

krlmlr commented Dec 23, 2015

The upstream project, libxml, seems to be dormant: No commits for a year, no reaction to the issue I filed.

kbrose added a commit to kbrose/e-coli-beach-predictions that referenced this issue Jan 16, 2016

Fixing DEFINEDNAME bug fix
Using the fix found at
tidyverse/readxl#82 (comment)
to write the output to the null file. Should work on either
Windows, Linux, and Mac OSX at the least.
@t-kalinowski

This comment has been minimized.

t-kalinowski commented Aug 27, 2016

I recently ran into this issue. Similar in spirit to the solution above, I ended up masking the readxl functions in my global env with these silenced versions. Leaving this here in case anyone else runs into the same issue

excel_sheets <- function(path) {
  quiet_excel_sheets <- purrr::quietly(readxl::excel_sheets)
  out <- quiet_excel_sheets(path)
  if(length(c(out[["warnings"]], out[["messages"]])) == 0)
         return(out[["result"]])
  else readxl::excel_sheets(path)
}

read_excel <-  function(...) {
  quiet_read <- purrr::quietly(readxl::read_excel)
  out <- quiet_read(...)
  if(length(c(out[["warnings"]], out[["messages"]])) == 0)
    return(out[["result"]])
  else readxl::read_excel(...)
}
@arvi1000

This comment has been minimized.

arvi1000 commented Dec 11, 2016

Another +1 for this issue. I'm using the same workaround posted above for now:

quiet_read <- purrr::quietly(readxl::read_excel)
my_data <- quiet_read("my_sheet.xls")$result

I'm reading an .xls (not .xlsx) and my sessionInfo is:

Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.6 (El Capitan)

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] lubridate_1.5.6  readxl_0.1.1     magrittr_1.5     data.table_1.9.6 ggplot2_2.2.0   

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.6      assertthat_0.1   chron_2.3-47     grid_3.3.1       plyr_1.8.4      
 [6] gtable_0.2.0     formatR_1.4      scales_0.4.1     stringi_1.1.1    lazyeval_0.2.0  
[11] tools_3.3.1      stringr_1.1.0    munsell_0.4.3    colorspace_1.2-6 knitr_1.14      
[16] tibble_1.1 ```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment