New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dates in numeric columns #263

Closed
gergness opened this Issue Feb 13, 2017 · 7 comments

Comments

3 participants
@gergness
Contributor

gergness commented Feb 13, 2017

When a date is in a column that is either numeric or character, readxl silently includes the numeric representation of the date. Should this give a warning? See some discussion in #256

@stla

This comment has been minimized.

stla commented Feb 13, 2017

Strange, I've never seen that. With a xlsx file, I always get dates as POSIXct.

@gergness

This comment has been minimized.

Contributor

gergness commented Feb 13, 2017

Sorry, I didn't make myself very clear. This happens when a column has both text and dates in the same column.

library(readxl) 
read_excel("example.xls")
#> # A tibble: 3 × 1
#>           var1
#>          <chr>
#> 1        30-30
#> 2        20-20
#> 3 43018.000000

read_excel("example.xls", col_types = "date")
#> # A tibble: 3 × 1
#>         var1
#>       <dttm>
#> 1       <NA>
#> 2       <NA>
#> 3 2017-10-10
#> Warning messages:
#> 1: In xls_cols(path, sheet, col_names = col_names, col_types = col_types,  :
#>   Expecting date in [1, 1] got '30-30'
#> 2: In xls_cols(path, sheet, col_names = col_names, col_types = col_types,  :
#>   Expecting date in [2, 1] got '20-20'
 
#> # Or with new "list" type
read_excel("example.xls", col_types = "list")
#> # A tibble: 3 × 1
#>         var1
#>       <list>
#> 1  <chr [1]>
#> 2  <chr [1]>
#> 3 <dttm [1]>
@jennybc

This comment has been minimized.

Member

jennybc commented Feb 13, 2017

@gergness Can you also attach the examples.xls file to this issue for completeness? Is this xls specific?

@gergness

This comment has been minimized.

Contributor

gergness commented Feb 13, 2017

Ha sure, sorry for not giving the full reprex. This problem does affect both .xls and .xlsx.

I've just realized that github allows .xlsx files (I didn't check after it wouldn't allow my .xls file), so I've attached the same data as a .xlsx: example.xlsx

And here's a dropbox link for the .xls: example.xls

And perhaps even better, here's using test files from the package (must be run from the Rproject's dir):

library(readxl)
test_sheet <- function(fname) rprojroot::find_testthat_root_file("sheets", fname)

types_xlsx <- read_excel(test_sheet("types.xlsx"), col_types = rep("text", 5))
types_xlsx$date
#> [1] "38717" "38717" NA  

# Note that the "Unknown type: 517" message is unrelated.
types_xls <- read_excel(test_sheet("types.xls"), col_types = rep("text", 5))
#> Unknown type: 517
types_xls$date
#> [1] "38717.000000" "38717"        NA  
@jennybc

This comment has been minimized.

Member

jennybc commented Feb 14, 2017

Is this an exact duplicate of #259 then? I know about that.

@gergness

This comment has been minimized.

Contributor

gergness commented Feb 14, 2017

No, I just grabbed the first example spreadsheet I could find that had date formatting, so once you fix #259, if you read that data in as text it would still read: [1] "38717.000000" "38717" NA

This issue is about whether readxl should silently promote date cells to a number when they are within text or numeric columns.

When a date is encountered in a numeric column, I think that it should give a warning and set the value to NA instead of the numeric representation of the date. This would match the behavior of how it handles unexpected characters in a cell.

For a text column, I think that there is the possibility to do a little more, but I don't know how much complexity you'd be willing to add. Some possibilities are:

  1. Just give a warning, as above.
  2. Allow the user to specify a way to format the date as text (eg "%m/%d/%y" -> "10/10/2017")
  3. Read from the excel spreadsheet how Excel is formatting the date and convert it the same way.
@jennybc

This comment has been minimized.

Member

jennybc commented Mar 1, 2017

See #118 re: the issue of creating strings from dates.

I'm going to use this issue just to track the treatment of date cells within a numeric column. Which I think should become NA and throw a warning.

@jennybc jennybc changed the title from Dates in numeric and character columns to Dates in numeric columns Mar 1, 2017

@jennybc jennybc closed this in #277 Mar 5, 2017

@jennybc jennybc moved this from TODO to Done in jennybc Mar 10, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment