Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid sheet xml (no <worksheet>) #202

Closed
PPhilipp85 opened this issue Sep 14, 2016 · 19 comments
Closed

Invalid sheet xml (no <worksheet>) #202

PPhilipp85 opened this issue Sep 14, 2016 · 19 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@PPhilipp85
Copy link

Hello,
a software of a supplier of my company creates .xlsx files that cannot be treated with readxl.
I get: Error: Invalid sheet xml (no )

If I open those files and re-save them with my Excel2007, read_excel works.
I can see that the xml structure between the original and the re-saved files differs. Due to confidential reasons, I cannot share the full file (I would need to erase some information and so on...), but I have attached one xml sheet file for the original not working and the re-saved working version.

Xml_Sheet_Examples.zip

Hope you can find something out.

Many Thanks
Peter

@jennybc
Copy link
Member

jennybc commented Sep 14, 2016

There are no differences between the two XML files in the zip archive. Chances are the difference is not in the XML for worksheet data (what you sent) but rather in one or more of the other XML files that make up xlsx. Likely related to #80 #104 #168 #200.

@PPhilipp85
Copy link
Author

Sorry, I messed it up ... Please have a look at these files:
XML_Sheet_Examples_v2.zip

@jennybc
Copy link
Member

jennybc commented Sep 14, 2016

OK yes these are different now. Do you know or can you find out the name of the software that authors these? Have you tried opening them with other Excel-reading R packages? Because it would be informative to know which ones can/cannot open the original xlsx files.

For the moment, I suspect you might have more immediate success with xlsx or XLConnect, which both wrap the Apache POI. Or keep opening and re-saving to use with readxl.

@PPhilipp85
Copy link
Author

I'm afraid I cannot share the software's name... But I think it would not help you because it comes with a certain equipment we use. Commercial and very specific.

I've tried to open the files in "openxlsx", which ends up with a similar failure mode.
The POI-based packages "xlsx" and "XLConnect" can open these type of files. Currently, I use XLConnect, but I would prefer to use readxl in the future to speed up and get rid of the external dependencies.

@jennybc
Copy link
Member

jennybc commented Sep 15, 2016

I've tried to open the files in "openxlsx", which ends up with a similar failure mode.
The POI-based packages "xlsx" and "XLConnect" can open these type of files.

Yeah, that what I expected. There's obviously a set of ayptical xlsx files that Apache POI (and Excel itself) can open that readxl (and openxlsx) currently cannot. I think it can be fixed but I doubt the fix is imminent.

@PPhilipp85
Copy link
Author

Ok. Thanks for the quick feedback!

@openclosure
Copy link

I have an identical issue, also with files produced by extremely niche commercial software. Since a fix is not on the horizon and for my project I cannot rely on the Java dependencies of working packages, I thought it might be useful to share my workaround: Below is a function that will use a VBS to silently convert the troublesome XLSX files into CSV. It is not perfect, obviously it depends on having Excel installed and is Windows-only. It will also leave a .vbs file in the directory of the original xlsx file. It works quickly enough for my purposes (1-2 seconds per CSV) but if you need it be faster you can change system(wait = F) and loop it. Hope this is helpful to someone.


# Converts an xlsx file to a csv file using VBA
convert_xlsx_to_csv <- function(file_path, output_path){

  # Create VBA macro file
  if(!file.exists(file.path(dirname(file_path), "vba_csv_converter_macro.vbs"))){

    vba_csv_converter <-           
      "if WScript.Arguments.Count < 2 Then
    WScript.Echo \"Error! Please specify the source path and the destination. Usage: XlsToCsv SourcePath.xlsx Destination.csv\"
    Wscript.Quit
    End If
    Dim oExcel
    Set oExcel = CreateObject(\"Excel.Application\")
    oExcel.DisplayAlerts = False
    Dim oBook
    Set oBook = oExcel.Workbooks.Open(Wscript.Arguments.Item(0))
    oBook.SaveAs WScript.Arguments.Item(1), 6 
    oBook.Close False
    oExcel.Quit"

    write(vba_csv_converter, file = file.path(dirname(file_path), "vba_csv_converter_macro.vbs"))

  } 

  # Create Paths / Shell Script
  macro_path_norm  <- shQuote(normalizePath(file.path(dirname(file_path), "vba_csv_converter_macro.vbs"), mustWork = F))
  file_path_norm   <- shQuote(normalizePath(file_path  , mustWork = F))
  output_path_norm <- shQuote(normalizePath(output_path, mustWork = F))
  script_string <- paste("cscript //Nologo //B //H:cscript", macro_path_norm, file_path_norm, output_path_norm)

  # Exec script
  system(script_string, show.output.on.console = F, wait= T)

}

@jennybc jennybc added bug an unexpected problem or unintended behavior sheet_rels labels Jan 7, 2017
@jennybc
Copy link
Member

jennybc commented Jan 20, 2017

@PPhilipp85 Different xml files would be more helpful to me. Can I see these files instead, for not-working and working?

  • xl/workbook.xml
  • xl/_rels/workbook.xml.rels

@jennybc
Copy link
Member

jennybc commented Jan 20, 2017

Note to self: snippet of the original_notWorking.xml I have at this point, with the non-standard x namespace prefix:

<x:worksheet xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:x14ac="http://schemas.microsoft.com/office/spreadsheetml/2009/9/ac" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main" mc:Ignorable="x14ac">
<x:dimension ref="A1:N6"/>
<x:sheetViews>
<x:sheetView workbookViewId="0"/>
</x:sheetViews>
<x:sheetFormatPr baseColWidth="10" defaultRowHeight="15" x14ac:dyDescent="0.25"/>
<x:sheetData>...</x:sheetData>
<x:autoFilter ref="A1:N1"/>
<x:pageMargins left="0.7" right="0.7" top="0.48" bottom="0.48" header="0.3" footer="0.3"/>
</x:worksheet>

@PPhilipp85
Copy link
Author

@jennybc Please see below the requested files.
readxl_Troubleshooting.zip

Please note that reading sheet1 also fails because it is not named sheet1.xml in the original not working file. It is named sheet.xml instead. Strange ... but thats how it is. The following sheets are then sheet2.xml, sheet3.xml and so on.

I also tried to take sheet1.xml from the re-saved working xlsx and inserted it in the original not working xlsx. This also brings erros:
1: In read_xlsx_(path, sheet, col_names = col_names, col_types = col_types, :
[1, 1]: Invalid string id 0
2: In read_xlsx_(path, sheet, col_names = col_names, col_types = col_types, :
[1, 2]: Invalid string id 1
3: In read_xlsx_(path, sheet, col_names = col_names, col_types = col_types, :
[1, 3]: Invalid string id 2
To me this means that the failur is not only due to the xml structure with the x namespace prefix.

@jennybc
Copy link
Member

jennybc commented Jan 25, 2017

Did you mean to close this?

No, there are namespace problems. You've never provided a full workbook, but these namespace issues run throughout several xml files and affect. It's pretty difficult to do the necessary surgery, as you're attempting above, by hand.

@PPhilipp85
Copy link
Author

Sorry... Closed it by mistake.

@PPhilipp85 PPhilipp85 reopened this Jan 25, 2017
@PPhilipp85
Copy link
Author

I will see if I can provide a full workbook.

@jennybc
Copy link
Member

jennybc commented Mar 20, 2017

@etrippler @PPhilipp85 I believe I have a fix for this in an unpushed branch. But some live, problematic workbooks from one or both of you would be very helpful to me. I'd be happy to just open them once, to verify my fix works, then delete. You could provide them to me privately, i.e. not via GitHub.

@PPhilipp85
Copy link
Author

PPhilipp85 commented Mar 21, 2017 via email

@jennybc
Copy link
Member

jennybc commented Mar 21, 2017

@PPhilipp85 You can email me at jenny@stat.ubc.ca.

And/or you can try the current dev version out on some of your problematic sheets. I just merged the putative fix! I suppose if all is well, we are OK. But if not, I would appreciate a real example for more troubleshooting.

@openclosure
Copy link

Hi @jennybc, sorry for the late reply to your request. I tried out the dev version and it still doesn't appear to work for me.

Couldn't find 'xl/styles.xml' in ...

I can still send you a problematic workbook via email if you are curious to check it out.

@jennybc
Copy link
Member

jennybc commented Mar 24, 2017

@etrippler Yes please do send me a problematic workbook.

@jennybc
Copy link
Member

jennybc commented Mar 25, 2017

@etrippler is sorted out now -- needed to install latest dev version.

@lock lock bot locked and limited conversation to collaborators Oct 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants