Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Dates in numeric columns #263
Sorry, I didn't make myself very clear. This happens when a column has both text and dates in the same column.
library(readxl) read_excel("example.xls") #> # A tibble: 3 × 1 #> var1 #> <chr> #> 1 30-30 #> 2 20-20 #> 3 43018.000000 read_excel("example.xls", col_types = "date") #> # A tibble: 3 × 1 #> var1 #> <dttm> #> 1 <NA> #> 2 <NA> #> 3 2017-10-10 #> Warning messages: #> 1: In xls_cols(path, sheet, col_names = col_names, col_types = col_types, : #> Expecting date in [1, 1] got '30-30' #> 2: In xls_cols(path, sheet, col_names = col_names, col_types = col_types, : #> Expecting date in [2, 1] got '20-20' #> # Or with new "list" type read_excel("example.xls", col_types = "list") #> # A tibble: 3 × 1 #> var1 #> <list> #> 1 <chr > #> 2 <chr > #> 3 <dttm >
Ha sure, sorry for not giving the full reprex. This problem does affect both .xls and .xlsx.
I've just realized that github allows .xlsx files (I didn't check after it wouldn't allow my .xls file), so I've attached the same data as a .xlsx: example.xlsx
And here's a dropbox link for the .xls: example.xls
And perhaps even better, here's using test files from the package (must be run from the Rproject's dir):
library(readxl) test_sheet <- function(fname) rprojroot::find_testthat_root_file("sheets", fname) types_xlsx <- read_excel(test_sheet("types.xlsx"), col_types = rep("text", 5)) types_xlsx$date #>  "38717" "38717" NA # Note that the "Unknown type: 517" message is unrelated. types_xls <- read_excel(test_sheet("types.xls"), col_types = rep("text", 5)) #> Unknown type: 517 types_xls$date #>  "38717.000000" "38717" NA
No, I just grabbed the first example spreadsheet I could find that had date formatting, so once you fix #259, if you read that data in as text it would still read:
This issue is about whether readxl should silently promote date cells to a number when they are within text or numeric columns.
When a date is encountered in a numeric column, I think that it should give a warning and set the value to NA instead of the numeric representation of the date. This would match the behavior of how it handles unexpected characters in a cell.
For a text column, I think that there is the possibility to do a little more, but I don't know how much complexity you'd be willing to add. Some possibilities are: