New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More robust logic for xlsx sheet lookup; fixes #104, relates to #80 #233

Merged
merged 15 commits into from Jan 24, 2017

Conversation

Projects
None yet
3 participants
@jennybc
Member

jennybc commented Jan 23, 2017

Current lookup assumes that xml for sheet i is in xl/worksheets/sheeti.xml. But it's not that simple. Instead we need to consult the i-th sheet node in xl/workbook.xml to learn the sheet's Id, e.g. rId5. Then lookup the target xml file based on this Id in xl/_rels/workbook.xml.rels.

When does this matter? Various non-Excel tools write xlsx affected by this. And it even happens in xlsx written by Excel if there are embedded chartsheets.

Thx @jimhester for writing most of this while we worked together last week.

Real example of i vs worksheet name vs Id vs xml target file (reflects test workbook sheet-xml-lookup.xlsx):

## i      name    Id                   Target
##      <chr> <chr>                    <chr>
## 1   Africa  rId3 xl/worksheets/sheet4.xml
## 2 Americas  rId4 xl/worksheets/sheet3.xml
## 3     Asia  rId5 xl/worksheets/sheet5.xml
## 4   Europe  rId6 xl/worksheets/sheet1.xml
## 5  Oceania  rId7 xl/worksheets/sheet2.xml

Browse the xml for this sheet mini-gap.xlsx.

@jennybc jennybc changed the title from More robust logic for sheet lookup; fixes #104, fixes #168, fixes #116, fixes #200, relates to #80 to More robust logic for xlsx sheet lookup; fixes #104, fixes #168, fixes #116, fixes #200, relates to #80 Jan 24, 2017

@hadley

Why is tests/testthat/sheet-xml-lookup.xls checked in? I don't think you use it anywhere.

Otherwise, looks good.

@@ -9,56 +9,126 @@
#include "zip.h"
class XlsxWorkBook {
class SheetRelations {

This comment has been minimized.

@hadley

hadley Jan 24, 2017

Member

Might be useful to add a brief comment here, summarising the purpose of this class

This comment has been minimized.

@jennybc
}
}
void parse_workbook_rels(const std::string path) {

This comment has been minimized.

@hadley

hadley Jan 24, 2017

Member

Missing &

This comment has been minimized.

@jennybc
@@ -9,3 +9,47 @@ test_that("excel_sheets returns utf-8 encoded text", {
# maybe libxls isn't reencording? It's a bit suspicious that mu is
# \u00b5 and libxls is giving \xb5
})
test_that("informative error when requesting non-existent sheet by name", {
expect_error(read_excel("iris-excel.xlsx", sheet = "tulip"),

This comment has been minimized.

@hadley

hadley Jan 24, 2017

Member

Can you please indent long lines like:

foo(
  a,
  b
)

This comment has been minimized.

@jennybc
## #104, #168, some of #80
## #116, #200 (chartsheet case)
test_that("sheet data xml target is explicitly looked up [xlsx]", {

This comment has been minimized.

@hadley

hadley Jan 24, 2017

Member

We have a mild convention to put issue numbers in the test name: "looked up (#104, #168)" ...

You might also consider consolidating all those issues into one so you can just refer to a single number.

This comment has been minimized.

@jennybc

jennybc Jan 24, 2017

Member

I have made #104 the main issue for this now.

This comment has been minimized.

@jennybc

@jennybc jennybc changed the title from More robust logic for xlsx sheet lookup; fixes #104, fixes #168, fixes #116, fixes #200, relates to #80 to More robust logic for xlsx sheet lookup; fixes #104, relates to #80 Jan 24, 2017

@jennybc

This comment has been minimized.

Member

jennybc commented Jan 24, 2017

I removed tests/testthat/sheet-xml-lookup.xls (97c12a5) and addressed the other comments.

I'm going to interpret "otherwise, looks good" as permission to merge this myself.

@jennybc jennybc merged commit 55760ca into tidyverse:master Jan 24, 2017

4 checks passed

codecov/patch 94.28% of diff hit (target 59.88%)
Details
codecov/project 60.63% (+0.75%) compared to 989c832
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@jennybc jennybc deleted the jennybc:iss104_worksheet-lookup branch Mar 25, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment