New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot read minimalistic XLSX file #506

Closed
j6t opened this Issue Sep 13, 2018 · 3 comments

Comments

Projects
None yet
2 participants
@j6t
Copy link

j6t commented Sep 13, 2018

read_excel fails to read the file styles-sharedStrings-absent.xlsx. The file is accepted by Microsoft Excel and LibreOffice.

The error message is

Error in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet,  :
  Evaluation error: Couldn't find '' in 'D:\Src\readxl\tests\testthat\sheets\styles-sharedStrings-absent.xlsx'.
library(readxl)
x <- read_excel("styles-sharedStrings-absent.xlsx")
#> Error in read_fun(path = enc2native(normalizePath(path)), sheet_i = sheet,  :
#>  Evaluation error: Couldn't find '' in 'D:\Src\readxl\tests\testthat\sheets\styles-sharedStrings-absent.xlsx'.

j6t pushed a commit to j6t/readxl that referenced this issue Sep 13, 2018

Johannes Sixt
Accept XLSX files that do not contain a "styles" part; fixes tidyvers…
…e#506

An XLSX file that does not contain a "styles" part triggers this error:

  > read_excel("foo.xlsx", sheet="Sheet1")
  Error in sheets_fun(path) :
    Evaluation error: Couldn't find '' in 'foo.xlsx'.

To fix this, check for the presence of the part in cacheDateFormats() in
the same way that cacheStringTable() does.
@jennybc

This comment has been minimized.

Copy link
Member

jennybc commented Sep 13, 2018

Thanks for providing an example.

Since I like to keep track of this stuff when I can, how does one get such a file in the real world, i.e. without a styles part?

@j6t

This comment has been minimized.

Copy link

j6t commented Sep 13, 2018

Viscovery SOMine generates these files. This is a predictive modeling software developed by the company for which I work.

@jennybc

This comment has been minimized.

Copy link
Member

jennybc commented Dec 12, 2018

For posterity. My reading of ECMA-376 5th edition Part 1 confirms that the Styles Part is not an absolute requirement.

12.2 Package Structure

A SpreadsheetML package shall contain a package-relationship item and a content-type item. The package- relationship item shall have implicit relationships with targets of the following type:

  • One Workbook part (12.3.23).

The package-relationship item is permitted to have implicit relationships with targets of the following type:

  • Digital Signature Origin (§15.2.7)
  • File Property parts (§15.2.12) (Application-Defined File Properties, Core File Properties, and Custom File
    Properties), as appropriate.
  • Thumbnail (§15.2.16).

The required and optional relationships between parts are defined in §12.3 and its subordinate clauses.

and eventually ...

12.3.20 Styles Part

An instance of this part type contains all the characteristics for all the cells in the workbook. Such information includes numeric and text formatting, alignment, font, color, and border.

A package shall contain no more than one Styles part, and that part shall be the target of an implicit relationship from the Workbook (§12.3.23) part.

@jennybc jennybc closed this in #528 Dec 12, 2018

jennybc added a commit that referenced this issue Dec 12, 2018

PR #505: accept .xlsx that lack a "styles" part (#528)
* Accept XLSX files that do not contain a "styles" part; fixes #506

An XLSX file that does not contain a "styles" part triggers this error:

  > read_excel("foo.xlsx", sheet="Sheet1")
  Error in sheets_fun(path) :
    Evaluation error: Couldn't find '' in 'foo.xlsx'.

To fix this, check for the presence of the part in cacheDateFormats() in
the same way that cacheStringTable() does.

* Rename test sheet; code style

* Beef up the test a bit

* Add NEWS bullet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment