New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong worksheet retrieved from xlsx file exported from Google Sheets #104

Closed
jennybc opened this Issue May 23, 2015 · 2 comments

Comments

Projects
None yet
2 participants
@jennybc
Member

jennybc commented May 23, 2015

When I use read_excel() on an xlsx file obtained by downloading a Google Sheet as xlsx, the sheet = argument doesn't behave as expected. I don't get the sheet I request, regardless of whether I provide an integer or sheet name. If I open the file in Excel, re-save and try again, everything works fine. I've confirmed this with xlxs files from two different Google Sheets. I don't know if Google Sheets exports weird/broken xlsx or what?

Here's a complete example:

https://gist.github.com/jennybc/279ab4675f5f4d275350#file-2015-05-23_read-excel-sheet-md

@richfitz

This comment has been minimized.

richfitz commented Dec 16, 2015

With the example Jenny posted, it does not matter if integer or string numbers are used becuase readxl:::xlsx_sheets does the right thing there.

The issue is that for google sheets generated files the assumptions about numbering of the worksheet files within the bundle is incorrect. The constructor for XlsxWorkSheet makes the assumption that the ith sheet is in the file:

std::string sheetPath = tfm::format("xl/worksheets/sheet%i.xml", i + 1);

but the correct place to look for the integer -> filename mapping seems to be in the xl/_rels/workbook.xml.rels file:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/sharedStrings" Target="sharedStrings.xml"/>
<Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="styles.xml"/>
<Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet3.xml"/>
<Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet4.xml"/>
<Relationship Id="rId6" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet1.xml"/>
<Relationship Id="rId5" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet5.xml"/>
<Relationship Id="rId7" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet2.xml"/>
</Relationships>

It is possible that the order here is the Correct order (I've not looked at the spec) but this is the same order as the workbook. So if we want the "first" sheet it is actually "worksheets/sheet3.xml" rather than "worksheets/sheet1.xml".

@jennybc

This comment has been minimized.

Member

jennybc commented May 22, 2016

Poking around in xlsx for my own purposes. I am staring at a clean table of worksheet info, so recording here in case it is helpful.

A data frame with one row per worksheet in mini-gap.xlsx (a Google Sheet version).

## $sheets_df
## Source: local data frame [5 x 5]
## 
##   sheetId     name    Id                   Target
##     <int>    <chr> <chr>                    <chr>
## 1       1   Africa  rId3 xl/worksheets/sheet4.xml
## 2       2 Americas  rId4 xl/worksheets/sheet3.xml
## 3       3     Asia  rId5 xl/worksheets/sheet5.xml
## 4       4   Europe  rId6 xl/worksheets/sheet1.xml
## 5       5  Oceania  rId7 xl/worksheets/sheet2.xml
  • sheetId: from xl/workbook.xml
  • name: from xl/workbook.xml
  • Id: from xl/_rels/workbook.xml.rels; a.k.a. id in xl/workbook.xml
  • Target: from xl/_rels/workbook.xml.rels

xl/workbook.xml links worksheet name or integer index to Id and xl/_rels/workbook.xml.rels links Id to target xml.

@jennybc jennybc changed the title from Puzzling behaviour of sheet argument on xlsx file exported from Google Sheets to Wrong worksheet retrieved from xlsx file exported from Google Sheets Jan 17, 2017

@jennybc jennybc closed this in 55760ca Jan 24, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment