Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated and unable to read certain excel files now #496

Closed
dominicshore opened this issue Jul 29, 2018 · 29 comments
Closed

Updated and unable to read certain excel files now #496

dominicshore opened this issue Jul 29, 2018 · 29 comments

Comments

@dominicshore
Copy link

I am having trouble opening/reading excel files that I download from the website of the Australian Bureau of Statistics using readxl.

I've downloaded Table 12 from the website but when I go to read the sheets of the workbook in r I get an error message:

    library(readxl)    
    excel_sheets(path = "C:/Users/Name/Documents/downloaded_file.xls")

    "Error in xls_sheets(path) : Failed to open C:/Users/Name/Documents/downloaded_file.xls".

In previous versions of readxl I have had no trouble reading these files into r but I've recently updated my readxl version, after a hiatus of several months, and now it doesn't work.

I have tried to download the file using the download.file function taking care to set mode = wb but that makes no difference to being able to access the data in the workbook either.

Grateful for any pointers.

@jennybc
Copy link
Member

jennybc commented Jul 29, 2018

libxls, which readxl embeds, got some big updates lately. These have been overwhelmingly positive but we've also seen some regressions.

Is that really your error message? Specifically, I'm wondering if you've sanitized it. I'm wondering what the actual path is.

I'm busy at the Joint Statistical Meetings this week. But when that's done, I can attempt to read this file with a standalone libxls tool to determine if the problem is in the R code/package or libxls.

To get unstuck, consider going backwards (to previous CRAN version) or forwards (to current dev version on GitHub).

@dominicshore
Copy link
Author

dominicshore commented Jul 29, 2018

Hi Jenny

No, I haven't sanitised the error message at all. They read:

xpath <- 'C:/Users/Name/Downloads/6202012.xls'
> read_excel(path = xpath)
Error in read_fun(path = path, sheet_i = sheet, limits = limits, shim = shim,  : 
  Failed to open C:/Users/Name/Downloads/6202012.xls
> excel_sheets(xpath)
Error in xls_sheets(path) : 
  Failed to open C:/Users/Name/Downloads/6202012.xls

Thanks for the pointer. I'll try to install a different version.

@jennybc
Copy link
Member

jennybc commented Jul 30, 2018

So your username is truly Name?

I really want to know if there is any chance you have a space or other challenging characters in this path.

Is that really your error message? Specifically, I'm wondering if you've sanitized it. I'm wondering what the actual path is.

@lasseklokker
Copy link

lasseklokker commented Jul 31, 2018

I seem to have a similar issue:

data$curves_d1_d30 <- read_excel(path = 'FilePath/MonthFlow.xlsm', range = 'A11:O42', col_names = TRUE)
Error in read_fun(path = enc2native(path), sheet_i = sheet, limits = limits,  : 
  object '_readxl_read_xlsx_' not found

At first I thought the path was wrong, but running:

file.exists("FilePath/MonthFlow.xlsm")
[1] TRUE

So for me at least, this is not as simple.

/Lasse

@jimhester
Copy link
Contributor

@lasseklokker It looks like your readxl installation is broken. Please try re-installing readxl from a fresh R session. e.g. Restart R, don't load any packages and then install.packages("readxl").

@lasseklokker
Copy link

@jimhester you are absolutely correct! I works now. I had some processes starting R in the background making that realisation hard to make on my own! I have fixed it now, thanks for the suggestion!
/Lasse

@dominicshore
Copy link
Author

Hi Jenny

The username is Dominic, so no special characters I have tried to install the package from a fresh session but that doesn't seem to work for me:

>install.packages('readxl')
>library(readxl)
> file.exists('C:/Users/Dominic/Downloads/6202012.xls')
[1] TRUE

>excel_sheets('C:/Users/Dominic/Downloads/6202012.xls')
Error in excel_sheets("C:/Users/Dominic/Downloads/6202012.xls") : 
  could not find function "excel_sheets"

>read_excel('C:/Users/Dominic/Downloads/6202012.xls')
Error in read_fun(path = path, sheet_i = sheet, limits = limits, shim = shim,  : 
  Failed to open C:/Users/Dominic/Downloads/6202012.xls

> version
               _                           
platform       x86_64-w64-mingw32          
arch           x86_64                      
os             mingw32                     
system         x86_64, mingw32             
status                                     
major          3                           
minor          5.1                         
year           2018                        
month          07                          
day            02                          
svn rev        74947                       
language       R                           
version.string R version 3.5.1 (2018-07-02)
nickname       Feather Spray

@jennybc
Copy link
Member

jennybc commented Aug 7, 2018

excel_sheets('C:/Users/Dominic/Downloads/6202012.xls')
Error in excel_sheets("C:/Users/Dominic/Downloads/6202012.xls") :
could not find function "excel_sheets"

This is very peculiar and suggests something is goofy with your readxl installation.

Can you run basic examples that just read example sheets that ship with the package?

@dominicshore
Copy link
Author

Can you run basic examples that just read example sheets that ship with the package?

I've run all of the basic examples from the example sheets without any issues. More than that, I can import simple spreadsheets that I've created. I used to be able to import data from the Australia Bureau of Statistics using readxl but now, no dice.

This is very peculiar and suggests something is goofy with your readxl installation.

I've reinstalled it a few times now. Any advice on next steps would be appreciated.

@jennybc
Copy link
Member

jennybc commented Aug 7, 2018

Can you apply excel_sheets() to built-in example sheets?

@dominicshore
Copy link
Author

dominicshore commented Aug 7, 2018

Yes.
`> excel_sheets(readxl_example("datasets.xlsx"))

[1] "iris" "mtcars" "chickwts" "quakes" `

Does this mean that the issue has to do with the ABS spreadsheet that I'm trying to read? Some kind of quirk in that spreadsheet that was catered for in the previous version of readxl that wasn't carried over into the new perhaps.

@jennybc
Copy link
Member

jennybc commented Aug 7, 2018

I have no idea how the target file could make excel_sheets() an unfindable function within the readxl package.

That is, I don't know how to reconcile these two results:

excel_sheets('C:/Users/Dominic/Downloads/6202012.xls')
Error in excel_sheets("C:/Users/Dominic/Downloads/6202012.xls") : 
  could not find function "excel_sheets"

and

> excel_sheets(readxl_example("datasets.xlsx"))
[1] "iris" "mtcars" "chickwts" "quakes"

What happens if you try excel_sheets('C:/Users/Dominic/Downloads/6202012.xls') in the same session where you have just successfully executed excel_sheets(readxl_example("datasets.xlsx"))?

@dominicshore
Copy link
Author

dominicshore commented Aug 7, 2018

`> excel_sheets(readxl_example("datasets.xlsx"))
[1] "iris" "mtcars" "chickwts" "quakes"

excel_sheets('C:/Users/Dominic/Downloads/6202012.xls')
Error in xls_sheets(path) :
Failed to open C:/Users/Dominic/Downloads/6202012.xls
`

Looks like one works and the other doesn't, even in the same session.

@dominicshore
Copy link
Author

Jenny - Can I ask, if you download the file and try to import it, does it work for you?

@jennybc
Copy link
Member

jennybc commented Aug 7, 2018

OK so the function excel_sheets() can be found (but fails on this specific sheet). The error in #496 (comment) must have been a red herring caused by general install / re-install upheaval.

No, I cannot import this file with the dev version readxl. Yes I suspect the problem is this:

Some kind of quirk in that spreadsheet that was catered for in the previous version of readxl that wasn't carried over into the new perhaps.

I will know for sure when I pull upstream changes from libxls and create a standalone xls2csv tool and can try to import this file without using readxl at all. More soon ...

@dominicshore
Copy link
Author

Is there anything that someone with intermediate r skills could do to help here Jenny?

@jennybc
Copy link
Member

jennybc commented Aug 18, 2018

Almost certainly no 😞

The practical move is to use the previous version of readxl, which embeds an earlier version of libxls.

I have tracked down a few of these things in libxls but that is all in C and it can be fairly painful. Once I do a readxl development push, I am fairly hopeful I can solve or heave over the wall to libxls with a good example. But that's not a focus right this moment. It will come back around.

@henrique-andrade
Copy link

henrique-andrade commented Aug 20, 2018

Dear @dominicshore and @jennybc, I can't open some Excel files too. Please take a look at the following code and output:

# R version: 3.5.1 (64 bit)
# readxl version: 1.1.0
> library(readxl)
> trade <- read_excel("C:/Users/Henrique/Desktop/Comércio com ajuste sazonal.xls")
Error in read_fun(path = path, sheet_i = sheet, limits = limits, shim = shim,  : 
  Failed to open C:/Users/Henrique/Desktop/Comércio com ajuste sazonal.xls
> trade <- read_excel("C:/Users/Henrique/Desktop/Comércio com ajuste sazonal.xls", skip = 3)
Error in read_fun(path = path, sheet_i = sheet, limits = limits, shim = shim,  : 
  Failed to open C:/Users/Henrique/Desktop/Comércio com ajuste sazonal.xls
> trade <- read_excel("C:/Users/Henrique/Desktop/Comércio com ajuste sazonal.xls", sheet = "COMÉRCIO", skip = 3)
Error in sheets_fun(path) : 
  Failed to open C:/Users/Henrique/Desktop/Comércio com ajuste sazonal.xls

The same error appears using R 3.5.0, but I have no problem if I use R version 3.4.4 or 3.4.3. So I think the problem is not inside the readxl function.

@jennybc, I can send the Excel file to you if you think it can help you to solve the problem.

@jimhester
Copy link
Contributor

@henrique-andrade I am pretty sure that is a separate issue already fixed in the devel version of readxl #477. It occurs due to a change in how base-R handles file paths with non-ascii characters.

@action1947
Copy link

action1947 commented Sep 13, 2018

I had a similar situation
I think "excel_sheets" can only read file within certain size. If the file is too big, it gives error.

> excel_sheets("D:\\Experiment\\Leakage current study\\Forward gate stress stastical\\2018-06-30_CG5_BVA5_MID_x\\Schottky_IV#1@1_v1.xls")
> excel_sheets("D:\\Experiment\\Leakage current study\\Forward gate stress stastical\\2018-06-30_CG5_BVA5_MID_x\\Schottky_IV#1@1_v2.xls")
> excel_sheets("D:\\Experiment\\Leakage current study\\Forward gate stress stastical\\2018-06-30_CG5_BVA5_MID_x\\Schottky_IV#1@1_v3.xls")
Error in xls_sheets(path) : 
  Failed to open D:\Experiment\Leakage current study\Forward gate stress stastical\2018-06-30_CG5_BVA5_MID_x\Schottky_IV#1@1_v3.xls

The first two files contain 120 worksheets each, and the last file is the combination version of the first two. I have no problem in reading the first two files, but it gives error when I read the last one

I've solve this problem by saving the third file as "xlsx".The file size is reduced from 7,387 KB to 3,975 KB. And when I read the "xlsx" file, it doesn't give any error.

@kaushik12
Copy link

I am having similar issues where .xls files larger than 7MB aren't read in properly but when saving as .xlsx makes it work. Is there a way to not have to do this?

@action1947
Copy link

action1947 commented Nov 6, 2018 via email

@igorstorm
Copy link

Dear, @jimhester,
I am having the same problem as @henrique-andrade
I am using R 3.5.1 and readxl 1.1.0. You said the bug was already fixed in the devel version. When this correction will come out?

@jennybc
Copy link
Member

jennybc commented Dec 14, 2018

Having embedded a newer version of libxls in dev readxl, I can now read the original sheet posted by @dominicshore:

readxl::read_excel("investigations/6202012.xls")
#> New names:
#> * `` -> `..1`
#> * `` -> `..3`
#> * `` -> `..4`
#> * `` -> `..5`
#> * `` -> `..6`
#> * … and 6 more
#> # A tibble: 755 x 12
#>    ..1   `Time Series Wo… ..3   ..4   ..5   ..6   ..7   ..8   ..9   ..10 
#>    <chr> <chr>            <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#>  1 <NA>  <NA>             <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
#>  2 <NA>  <NA>             <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
#>  3 <NA>  6202.0 Labour F… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
#>  4 <NA>  Table 12. Labou… <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
#>  5 <NA>  <NA>             <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
#>  6 Rela… Summary Publica… Expl… Inqu… <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
#>  7 <NA>  <NA>             <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
#>  8 Data… <NA>             <NA>  Seri… Seri… Seri… Seri… No. … Unit  Data…
#>  9 <NA>  <NA>             <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>  <NA> 
#> 10 Empl… <NA>             <NA>  Trend A844… 28522 43252 485   000   STOCK
#> # … with 745 more rows, and 2 more variables: ..11 <chr>, ..12 <chr>

Created on 2018-12-13 by the reprex package (v0.2.1.9000)

This issue attracted a lot of "me too" comments. If others test their problem sheets with the dev version of readxl and still have problems, please open a new issue and link or attach the problem workbook.

@jennybc jennybc closed this as completed Dec 14, 2018
@techwithshadab
Copy link

I'm getting this error when trying to load xlsx file:

Error parsing file 'Media data and other information.xlsx'.
Error in xls2sep(xls, sheet, verbose = verbose, ..., method = method, :
Intermediate file '/tmp/RtmppKJJVg/file240cc0723c4.csv' missing!
In addition: Warning message:
running command ''/usr/bin/perl' '/home/shussain/R/x86_64-pc-linux-gnu-library/3.4/gdata/perl/xls2csv.pl' 'Media data and other information.xlsx' '/tmp/RtmppKJJVg/file240cc0723c4.csv' '2'' had status 255
Error in file.exists(tfn) : invalid 'file' argument

Any idea how to resolve this??

@jennybc
Copy link
Member

jennybc commented Sep 15, 2019

xls2sep() is not a function in the readxl package.

Based on the Perl stuff in your error, I would guess you're using gdata.

@amirhmstu

This comment has been minimized.

@amirhmstu

This comment has been minimized.

@amirhmstu

This comment has been minimized.

@tidyverse tidyverse locked as resolved and limited conversation to collaborators Sep 16, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants