Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to open xls files: "libxls error: Unable to open file" #598

Closed
Brunox13 opened this issue Dec 8, 2019 · 40 comments
Closed

Unable to open xls files: "libxls error: Unable to open file" #598

Brunox13 opened this issue Dec 8, 2019 · 40 comments

Comments

@Brunox13
Copy link

Brunox13 commented Dec 8, 2019

I am unable to load xls files (an example file can be downloaded here) using readxl and get the following error:

# Minimal reproducible example:
setwd("/Users/USER/Desktop")
library(readxl)
data <- read_xls(path="test_file.xls")
Error: 
  filepath: /Users/USER/Desktop/test_file.xls
  libxls error: Unable to open file

I have made sure that the file exists and is of the correct format

# Testing existence & format of the file
file.exists("test_file.xls")
[1] TRUE
format_from_ext("test_file.xls")
[1] "xls"
format_from_signature("test_file.xls")
[1] "xls"

I have already discussed this on StackOverflow and it seems that others are able to read this file - but I cannot figure out how to alleviate the problem with my setup, nor how to better diagnose it. I am able to read the file using gdata::read.xls (though I'm hoping that I'd be able to do so faster with read_xls, as I need to open multiple files); read_excel gives me the same error. I have already tried uninstalling readxl, restarting RStudio, and installing readxl again - to no avail.

I am using the following:

Mac OS 10.15.1
RStudio 1.2.5019
R 3.6.1
readxl 1.3.1

@jennybc
Copy link
Member

jennybc commented Dec 8, 2019

I get the same result as you with readxl.

I wonder if the people who succeed with readxl have opened the file once with Excel and re-saved it 🤔

I need to come back here and see if I can read it with the stand-alone tool using libxls to go from xls to csv. That would rule out readxl as the problem. I suspect this is what I will find.

@jimhester
Copy link
Contributor

Any chance that USER contains non-ASCII characters, spaces or is on a networked file system?

@Brunox13
Copy link
Author

Brunox13 commented Dec 10, 2019

Any chance that USER contains non-ASCII characters, spaces or is on a networked file system?

@jimhester USER is all lowercase letters, no spaces, and is locally on my Mac!

@Brunox13
Copy link
Author

Brunox13 commented Dec 10, 2019

I get the same result as you with readxl.

I wonder if the people who succeed with readxl have opened the file once with Excel and re-saved it 🤔

I need to come back here and see if I can read it with the stand-alone tool using libxls to go from xls to csv. That would rule out readxl as the problem. I suspect this is what I will find.

@jennybc The user (@red_quark) on StackOverflow said the following when I asked whether he opened the file in Excel first, or changed it in any way:

I didn't alter or open excel file before loading it with the readxl package. I run the code on Windows 10 Enterprise (RAM 16 GB, 64-bit). But! When I try to read your excel file with RStudio Server (Ubuntu), I get same error: libxls error: Unable to open file.

@Brunox13
Copy link
Author

I just tried opening the same file on Windows 7 and got the same libxls error: Unable to open file.

Specs:
Windows 7 Home Premium
RStudio 1.2.5019
R 3.6.1
readxl 1.3.1

@Brunox13
Copy link
Author

Brunox13 commented Dec 14, 2019

The person who is able to open the file is running the following specs:

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Windows: Windows 10 Enterprise, vers. 1803, OS build 17134.1006
Proc.: Intel Core i7-8550U, 64-bit OS
RStudio: Version 1.1.456 (July 19th, 2018)

The only thing that catches my eye is that they are using an old version of RStudio, but I have no idea if that could be the difference.

@rtburg
Copy link

rtburg commented Jan 31, 2020

I was seeing a similar error and wanted to share a short-term solution.

library(readxl)
download.file("https://mjwebster.github.io/DataJ/spreadsheets/MLBpayrolls.xls", "MLBPayrolls.xls")
MLBpayrolls <- read_excel("MLBpayrolls.xls", sheet = "MLB Payrolls", na = "n/a")

Yields (on some systems in my classroom but not others):

Error: filepath: MLBPayrolls.xls libxls error: Unable to open file

The temporary solution was to paste the URL of the xls file into Firefox and download it via the browser. Once this was done we could run the read_excel line without error.

This was happening today on Windows 10, with R 3.6.2 and R Studio 1.2.5033.

@Brunox13
Copy link
Author

I was seeing a similar error and wanted to share a short-term solution.

library(readxl)
download.file("https://mjwebster.github.io/DataJ/spreadsheets/MLBpayrolls.xls", "MLBPayrolls.xls")
MLBpayrolls <- read_excel("MLBpayrolls.xls", sheet = "MLB Payrolls", na = "n/a")

Yields (on some systems in my classroom but not others):

Error: filepath: MLBPayrolls.xls libxls error: Unable to open file

The temporary solution was to paste the URL of the xls file into Firefox and download it via the browser. Once this was done we could run the read_excel line without error.

This was happening today on Windows 10, with R 3.6.2 and R Studio 1.2.5033.

Thanks, @rtburg!
The issue described above was with files already downloaded locally, so this is likely a different matter.

@gregleleu
Copy link

gregleleu commented Feb 18, 2020

On my machine (see below) I have to use path.expand(my_path) for readxl to open xls files, no need for xlsx.

readxl::read_excel(path = path.expand("mysubfolder/myfile.xls"))

Linked to #583 ?

Hope it helps.

OS X 10.15.2,
R 3.6.2,
Rstudio 1.3.842
readxl 1.3.1

@Brunox13
Copy link
Author

Brunox13 commented Feb 18, 2020

On my machine (see below) I have to use path.expand(my_path) for readxl to open xls files, no need for xlsx.

readxl::read_excel(path = path.expand("mysubfolder/myfile.xls"))

Linked to #583 ?

Hope it helps.

OS X 10.15.2,
R 3.6.2,
Rstudio 1.3.842
readxl 1.3.1

Thanks for the suggestion, @gregleleu! Unfortunately, this makes no difference: using the full path, abbreviated path (being in the wd with the file), or using path.expand(), read_excel() still gives the same error. Did this work for you for the specific file linked above?

@gregleleu
Copy link

@Brunox13

Did this work for you for the specific file linked above?

readxl reads the file in the link above in my machine, even without using path.expand...

@jennybc
Copy link
Member

jennybc commented Feb 19, 2020

I really think this thread now has folks in it who are seeing at least 2 different phenomena.

@tnguyenqh
Copy link

I am facing the same issue with read_xls. I am running

R version 3.5.2 (2018-12-20)
R Studio 1.2.1335
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
readxl 1.3.0

@mkoohafkan
Copy link

I downloaded the file and got the error libxls error: Unable to open file. I opened the file and re-saved it. I was then able to open it without issue using readxl.

@Brunox13
Copy link
Author

I downloaded the file and got the error libxls error: Unable to open file. I opened the file and re-saved it. I was then able to open it without issue using readxl.

Thank you for your reply. I think this solution works for me as well - opening the file in MS Excel and re-saving it (either as *.xls or *.xlsx) does solve the problem. However, the original file is one of hundreds that are automatically generated during a research data collection, and so not having to run this file through Excel would be highly desirable.

@mkoohafkan
Copy link

mkoohafkan commented Mar 11, 2020

@Brunox13 for sure, opening and resaving the file is not the ideal solution. Rather I wanted to test if resaving resolved the issue. As it stands, I suspect this is a problem with the specific file, rather than a general bug in readxl or libxls.

Since you say you have hundreds of files---were these files generated from some other program? If so, I'm wondering if the program that generates these files has some bug or quirk in the way it builds the excel files that doesn't play nice with libxls.

@mkoohafkan
Copy link

mkoohafkan commented Mar 11, 2020

@Brunox13 Also maybe not an ideal solution, but you could write a simple C# program to open/save/close excel files. Here's an example for a single file, it could be easily modified to e.g. loop through files in a directory.

using System.IO;
using Excel = Microsoft.Office.Interop.Excel;

namespace resaver
{
  class Program
  {
    static void Main(string[] args)
    {
      string srcFile = Path.GetFullPath(args[0]);
      Excel.Application excelApplication = new Excel.Application();
      excelApplication.Application.DisplayAlerts = false;
      Excel.Workbook srcworkBook = excelApplication.Workbooks.Open(srcFile);
      srcworkBook.Save();
      srcworkBook.Close();
      excelApplication.Quit();
    }
  }
}

I redownloaded the file and confirmed it would not open with readxl. I ran it through this program and tried again, it worked.

@arcruz0
Copy link

arcruz0 commented Mar 16, 2020

@Brunox13 Also maybe not an ideal solution, but you could write a simple C# program to open/save/close excel files. Here's an example for a single file, it could be easily modified to e.g. loop through files in a directory.

using System.IO;
using Excel = Microsoft.Office.Interop.Excel;

namespace resaver
{
  class Program
  {
    static void Main(string[] args)
    {
      string srcFile = Path.GetFullPath(args[0]);
      Excel.Application excelApplication = new Excel.Application();
      excelApplication.Application.DisplayAlerts = false;
      Excel.Workbook srcworkBook = excelApplication.Workbooks.Open(srcFile);
      srcworkBook.Save();
      srcworkBook.Close();
      excelApplication.Quit();
    }
  }
}

I redownloaded the file and confirmed it would not open with readxl. I ran it through this program and tried again, it worked.

This can also be achieved using Libreoffice CLI:

loffice --convert-to xls --outdir some_folder some_file.xls

@jennybc jennybc added the libxls label May 11, 2020
@Myfanwy
Copy link

Myfanwy commented Jun 18, 2020

On my machine (see below) I have to use path.expand(my_path) for readxl to open xls files, no need for xlsx.

readxl::read_excel(path = path.expand("mysubfolder/myfile.xls"))

This worked for me as well; I am able read a local .xls file directly with readxl::read_excel, but as soon as I put the read_excel() call within a function, I needed to wrap the filepath in path.expand() in order to avoid the libxls error.

@shreykul
Copy link

I just opened the file once and saved it as I'm using Linux, the problem was about the file extension. LibreCalc was trying to convert it into .odt once specified that I wanted to keep the file in .xls extension, it stopped showing the error. In my opinion, you should try to open the file too.

@Brunox13
Copy link
Author

I just opened the file once and saved it as I'm using Linux, the problem was about the file extension. LibreCalc was trying to convert it into .odt once specified that I wanted to keep the file in .xls extension, it stopped showing the error. In my opinion, you should try to open the file too.

This approach has been suggested before - opening in MS Excel first also solves the problem. This solution, however, is not really feasible with the number of files that I work with (and I suppose others do as well).

@lefkiospaikousis
Copy link

I have a similar problem. It seems that my file is "Excel 2 worksheet"

I opened in Excel and gave me the following

image

Hope it helps

R version 3.6.1 (2019-07-05)
R Studio 1.2.5033
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10
readxl 1.3.1

@sanjmeh
Copy link

sanjmeh commented Nov 2, 2020

Is there a solution to this problem?
I guess older versions of xls are being rejected by the error:

Error: 
  filepath: ews/school_fac.xls
  libxls error: Unable to open file

Attaching the rejected file. Note: the file if uploaded to google drive opens easily as a googlesheet. So there is no corruption in the file.

school_fac.xls.zip

@paquetlab
Copy link

paquetlab commented Nov 16, 2020

I had the same exact problem with a bunch of .xls files and it turns out there was something wrong in all of them. I "solved" the problem by converting back all files to xls in batch mode using this command from LibreOffice:

/Applications/LibreOffice.app/Contents/MacOS/soffice --headless --convert-to xls --outdir your_favorite_outdir *.xls

For this to work on macOS you need to install LibreOffice : https://www.libreoffice.org/download/download/

@nasrin-math
Copy link

I had the same issue reading an "xlsx" file, I simply opened the file and saved it as an xls file and the problem was solved.
Hope it helps.

@nacnudus
Copy link
Contributor

nacnudus commented Dec 9, 2020

@sanjmeh @Brunox13 You both have files that are Your file is actually in the XML format, not .xls, despite their name. See this explanation.

@DanielFiddick
Copy link

I was able to successfully read the .xls files that were giving me the same error using the read.xlsx function from the package xlsx

@GabrielSlPires
Copy link

I had the same error when opening .xls files, .xlsx works fine.

Error: 
  filepath: D:\OneDrive\Documentos\Unicamp\LCroP\Vinicius\IRGA_R\dados\2021_04_13_vini_B1_SO_cont_P1_.xls
  libxls error: Unable to open file

@MichaelFolkes
Copy link

This is a total shot in the dark. I had the same error. It turns out the .xls files were incorrectly named, and were actually tab delimited text files (who'd have thunk). Excel was ok opening them but appreciable read_excel not so much. On the rare chance you also have text files that are named .xls, try opening in a text editor to confirm.

@jennybc
Copy link
Member

jennybc commented Apr 23, 2021

If you want to test this theory that something is NOT actually xls or xlsx (despite having that extension), readxl actually does have capabilities for this (which get invoked when we need to guess the format of extension-less files).

Here's what this looks like when a tab-delimited file is "sold" to readxl as .xls.

library(readxl)

(tmp <- tempfile("readxl-can-check-magic-nunber-", fileext = ".xls"))
#> [1] "/tmp/RtmpUAwxda/readxl-can-check-magic-nunber-31e2659e970e.xls"
writeLines("X1\tX2\na\tb\n", tmp)

read.delim(tmp)
#>   X1 X2
#> 1  a  b

read_xls(tmp)
#> Error: 
#>   filepath: /private/tmp/RtmpUAwxda/readxl-can-check-magic-nunber-31e2659e970e.xls
#>   libxls error: Unable to open file

readxl:::format_from_signature(tmp)
#> [1] NA

Created on 2021-04-23 by the reprex package (v2.0.0.9000)

@jennybc
Copy link
Member

jennybc commented Apr 23, 2021

This is thread is a real mix of different problems because unfortunately many distinct problems all lead to the same libxls error: Unable to open file error from libxls.

@nskoch
Copy link

nskoch commented Aug 11, 2021

converting xls file to xlsx format did the job.

@adomingues
Copy link

If you want to test this theory that something is NOT actually xls or xlsx (despite having that extension), readxl actually does have capabilities for this (which get invoked when we need to guess the format of extension-less files).

You might on to something. I had the same error but it turns out the file as not excel but rather plain text, despite the misleading "xls" extension. data.table::fread read it without an issue.

@beevabeeva
Copy link

beevabeeva commented Nov 29, 2021

You might on to something. I had the same error but it turns out the file as not excel but rather plain text, despite the misleading "xls" extension. data.table::fread read it without an issue.

Can confirm. In my case, the mime type was txt but the file was saved as xls.

read.delim("file.txt") worked after renaming file.xls to file.txt.

@jennybc
Copy link
Member

jennybc commented Mar 19, 2022

I just confirmed that the standalone xls2csv tool built from https://github.com/libxls/libxls/releases/tag/v1.6.2 cannot read the file provided by OP.

~/rrr/libxls % xls2csv ~/rrr/readxl/investigations/test_file1.xls 
FILE: /Users/jenny/rrr/readxl/investigations/test_file1.xls
Error reading XLS file: Unable to open file

This thread became something of a mess and is presumably a mix of several different issues. But I'm closing it because the OP's file appears to be unreadable with libxls at this time.

(And, yes, often 3rd party Excel-writing software creates weird, legacy .xls files. And opening them once in Excel "fixes" them. But that's not practical when someone has hundreds of such files. Then the only solutions are to motivate the developers of the 3rd party Excel-writing software to write a more standard file or to motivate the developers of your 3rd party Excel-reading software to account for your specific weirdness. If you're still motivated @Brunox13, you could report this upstream in libxls, with an example file.

@jennybc jennybc closed this as completed Mar 19, 2022
@jennybc
Copy link
Member

jennybc commented Mar 19, 2022

Apparently I already did report this file a while ago:

libxls/libxls#104

@Hoozits728
Copy link

I downloaded the file and got the error libxls error: Unable to open file. I opened the file and re-saved it. I was then able to open it without issue using readxl.

Thank you for your reply. I think this solution works for me as well - opening the file in MS Excel and re-saving it (either as *.xls or *.xlsx) does solve the problem. However, the original file is one of hundreds that are automatically generated during a research data collection, and so not having to run this file through Excel would be highly desirable.

I had to save the file as *.xlsx and that worked for me.

@33Vito
Copy link

33Vito commented Oct 12, 2022

FYI, I had the same libxls error: Unable to open file error when using the readxl::read_excel() function to load xls files. But I can get around the issue by using the readxl::read_xlsx() function (even though its an .xls file)

@hasibahmad1995
Copy link

hasibahmad1995 commented Nov 5, 2022

> library(readxl)
> read_excel("C:/Users/username/Desktop/Cosmetics Inc.xlsx")
Worked for me

@frankiethull
Copy link

I think the fix you all are looking for is changing the mode in download.file() which will fix the temp file corruption issue.

mode = 'wb' fixed the issue I had.

path_to_file <- "blah.xls"

temp <- tempfile(fileext = ".xls")
download.file(url = path_to_file, destfile = temp, mode = "wb")

df <- readxl::read_excel(path = temp, skip = 2) |>
      janitor::clean_names()
      

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests