Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle xls record type BoolErr (number 517); fixes #259, relates to #… #274

Merged
merged 2 commits into from Feb 26, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
@@ -1,5 +1,7 @@
# readxl 0.1.1.9000

* Boolean cells are now detected in xls. Suppresses message `"Unknown type: 517"`. For now, Boolean cells in both xls and xlsx become numeric, with values 0 and 1. (#274, #259 @jennybc)

* xls files written by some third party software report both row and column dimensions as 0-indexed, which prevents libxls from reading the last column. Small change to libxls restores access to those cells. (#273, #180, #152, #99 @jennybc)

* `col_types = "list"` loads data as a list of length-1 vectors, that are typed using the logic from `col_types = NULL`, but on a cell-by-cell basis (#262 @gergness).
Expand Down
44 changes: 34 additions & 10 deletions src/XlsCell.h
Expand Up @@ -34,16 +34,29 @@ inline CellType cellType(const xls::st_cell::st_cell_data cell,
xls::st_xf* styles,
const std::set<int>& customDateFormats,
const StringSet &na = "") {
// Find codes in [MS-XLS] S2.3.2 (p175).
// [MS-XLS] - v20161017, Release: October 17, 2016
//
// In 2.2.1 Cell Table on p80:
// "Cells are specified by any of the records specified in the CELL rule."
// (section 2.1.7.20.6).
// In 2.1.7.20.6 on p74, here is the CELL rule:
// CELL = FORMULA / Blank / MulBlank / RK / MulRk / BoolErr / Number / LabelSst
//
// 2.3 Record Enumeration
// Has 2 tables associating each record type value with a name and number.
// 2.3.1 starting p168 is ordered by name
// 2.3.2 starting p180 is ordered by number
//
// See xls_addCell for those used for cells
// and xlsstruct.h to confirm record numbers
switch(cell.id) {
case 253: // LabelSst
case 516: // Label
case 253: // 0x00FD LabelSst
case 516: // 0x0204 Label
return na.contains((char*) cell.str) ? CELL_BLANK : CELL_TEXT;
break;

case 6: // formula
case 1030: // formula (Apple Numbers Bug)
case 6: // 0x0006 formula
case 1030: // 0x0406 formula (Apple Numbers Bug)
if (cell.l == 0) {
return na.contains(cell.d) ? CELL_BLANK : CELL_NUMERIC;
} else {
Expand All @@ -55,9 +68,9 @@ inline CellType cellType(const xls::st_cell::st_cell_data cell,
}
break;

case 189: // MulRk
case 515: // Number
case 638: // Rk
case 189: // 0x00BD MulRk
case 515: // 0x0203 Number
case 638: // 0x027E Rk
{
if (na.contains(cell.d))
return CELL_BLANK;
Expand All @@ -70,11 +83,22 @@ inline CellType cellType(const xls::st_cell::st_cell_data cell,
}
break;

case 190: // MulBlank
case 513: // Blank
case 190: // 0x00BE MulBlank
case 513: // 0x0201 Blank
return CELL_BLANK;
break;

case 517: // 0x0205 BoolErr
{
if (!strcmp((char *) cell.str, "bool")) {
// switch to CELL_LOGICAL once exists; cell.d is 0/1 for FALSE/TRUE
return CELL_NUMERIC;
} else {
return CELL_TEXT;
}
}
break;

default:
Rcpp::Rcout << "Unknown type: " << cell.id << "\n";
return CELL_NUMERIC;
Expand Down
Binary file modified tests/testthat/sheets/types.xls
Binary file not shown.
Binary file modified tests/testthat/sheets/types.xlsx
Binary file not shown.
29 changes: 8 additions & 21 deletions tests/testthat/test-col-types.R
Expand Up @@ -11,7 +11,7 @@ test_that("illegal col_types are rejected", {
test_that("request for 'blank' col type gets deprecation message and fix", {
expect_message(
read_excel(test_sheet("types.xlsx"),
col_types = rep_len(c("blank", "text"), length.out = 5)),
col_types = rep_len(c("blank", "text"), length.out = 6)),
"`col_type = \"blank\"` deprecated. Use \"skip\" instead.",
fixed = TRUE
)
Expand Down Expand Up @@ -68,27 +68,17 @@ test_that("types imputed & read correctly [xlsx]", {
expect_is(types$boolean, "numeric")
expect_is(types$date, "POSIXct")
expect_is(types$string_in_row_3, "character")
skip("switch expecation to logical (vs numeric) when possible")
skip("switch expectation to logical (vs numeric) when possible (for xls too!)")
})

test_that("types imputed & read correctly [xls]", {
expect_output(
## valgrind reports this
## Conditional jump or move depends on uninitialised value(s)
types <- read_excel(test_sheet("types.xls")),
"Unknown type: 517"
## definitely due to these 'Unknown type: 517' msgs
## line 52 in ColSpec.h
## Rcpp::Rcout << "Unknown type: " << cell.id << "\n";
## if I skip this test, memcheck report is as clean as it ever gets
## https://github.com/tidyverse/readxl/issues/259
)
types <- read_excel(test_sheet("types.xls"))
expect_is(types$number, "numeric")
expect_is(types$string, "character")
#expect_is(types$boolean, "numeric")
expect_is(types$boolean, "numeric")
#expect_is(types$date, "POSIXct")
expect_is(types$string_in_row_3, "character")
skip("revisit these expectations when xls record type 517 is handled")
skip("activate date expectation when xls formula dates are sorted")
})

test_that("guess_max is honored for col_types", {
Expand All @@ -97,12 +87,9 @@ test_that("guess_max is honored for col_types", {
"expecting numeric"
)
expect_identical(types$string_in_row_3, c(1, 2, NA))
expect_output(
expect_warning(
types <- read_excel(test_sheet("types.xls"), guess_max = 2),
"expecting numeric"
),
"Unknown type: 517"
expect_warning(
types <- read_excel(test_sheet("types.xls"), guess_max = 2),
"expecting numeric"
)
expect_identical(types$string_in_row_3, c(1, 2, NA))
})
Expand Down