Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Date/times missing a second #187

Closed
renejuan opened this issue Jul 1, 2016 · 8 comments
Closed

Date/times missing a second #187

renejuan opened this issue Jul 1, 2016 · 8 comments
Labels
bug an unexpected problem or unintended behavior datetime 📆

Comments

@renejuan
Copy link

renejuan commented Jul 1, 2016

Not sure if this is an error but if in Excel you enter 4/28/2016 11:30:00 AM and read the file using readxl [for example, readxl::read_excel("./test.xlsx",col_names = FALSE)] you end up losing a second. Also the package assumes is UTC. See related post at http://stackoverflow.com/questions/38005379/excel-date-time-imported-into-r-missing-1-second

@jennybc
Copy link
Member

jennybc commented Jan 3, 2017

Can you share an actual xlsx file and your exact code to import? If I attempt to replicate in a simple way, "4/28/2016 11:30:00 AM" is imported as character, not as a datetime.

@jennybc jennybc added the reprex needs a minimal reproducible example label Jan 3, 2017
@jennybc
Copy link
Member

jennybc commented Jan 3, 2017

Nevermind. I can reproduce if I explicitly format the cell as a date in Excel, e.g. mm/dd/yyyy h:mm:ss AM/PM.

## https://github.com/hadley/readxl/issues/187
library(readxl)
df <- readxl::read_excel("iss187_date-time-missing-second.xlsx",
                         col_names = FALSE)
df
#>                      X0                  X1
#> 1 4/28/2016 11:30:00 AM 2016-04-28 11:29:59

@jennybc jennybc added bug an unexpected problem or unintended behavior datetime 📆 and removed reprex needs a minimal reproducible example labels Jan 3, 2017
@hadley
Copy link
Member

hadley commented Feb 9, 2017

I bet this is an off-by-one error!

@nacnudus
Copy link
Contributor

Even Excel doesn't even agree with Excel.

temp

datetime.xlsx

@nacnudus
Copy link
Contributor

nacnudus commented Feb 18, 2017

This seems to be a floating point precision problem, possibly fixable with chron.

In the output below:

  • xl_date is the date Excel displays in a cell with formatting yyyy-mm-dd hh:mm:ss
  • xl_numeric is what Excel displays as the numeric value of the date cells.
  • xlsx_numeric is what Excel writes to the file as the numeric value of the date cells.
  • rcpp_date is the date in R via Rcpp, obtained from xlsx_numeric by subtracting 25569, multiplying by 86400, and assigning attr(x, "class") <- c("POSIXct", "POSIXt") and attr(x, "tzone") <- "UTC".
  • r_xl is as above, but calculated in R, not Rcpp, and using the xl_numeric value.
  • r_xlsx is as above, but calculated in R, not Rcpp, and using the xlsx_numeric value.
  • chron_xlsx is chron(xlsx_numeric).

Points to note:

  • xl_numeric and xlsx_numeric aren't the same (Excel's display is different from what it writes to file).
  • rcpp_date rows 3 and 4 are both 59 seconds. This is countered by rows 9 and 10, which are both 50 seconds.
  • r_xlsx is the same as rcpp_date, also using the xlsx_numeric value, so it's not an Rcpp/C++ issue.
  • r_xl rows 4 and 5 are both 00 seconds, countered by rows 9 and 10, which are both 50 seconds. So using the value displayed by Excel, rather than the value in the file, still doesn't resolve the problem.
  • chron_xlsx is correct. I tested every second of the day, and the diffs were all ~1.
options(digits=17)
options(scipen=999)
xl_to_date <- function(x) {
  out <- ((x - 25569) * 86400)
  attr(out, "class") <- c("POSIXct", "POSIXt")
  attr(out, "tzone") <- "UTC"
  out
}
x <- tidyxl::tidy_xlsx("~/R/tidyxl/datetime.xlsx")$data[["Sheet2"]]
x <- data.frame(xl_date = x[x$col == 1, "character"],
                xl_numeric = x[x$col == 3, "character"],
                xlsx_numeric = x[x$col == 4, "numeric"],
                rcpp_date = x[x$col == 2, "date"],
                stringsAsFactors = FALSE)
x$r_xl <- xl_to_date(as.numeric(x$xl_numeric))
x$r_xlsx <- xl_to_date(as.numeric(x$xlsx_numeric))
x$chron_xlsx <- chron::chron(as.numeric(x$xlsx_numeric))
x
#>                xl_date       xl_numeric       xlsx_numeric
#> 1  2016-12-14 11:29:57 42718.4791320602 42718.479132060187
#> 2  2016-12-14 11:29:58 42718.4791435764 42718.479143576391
#> 3  2016-12-14 11:29:59 42718.4791550926 42718.479155092595
#> 4  2016-12-14 11:30:00 42718.4791666667 42718.479166666664
#> 5  2016-12-14 11:30:01 42718.4791781250 42718.479178125002
#> 6  2016-12-14 11:30:02 42718.4791896412 42718.479189641206
#> 7  2016-12-14 11:30:48 42718.4797223380 42718.479722337965
#> 8  2016-12-14 11:30:49 42718.4797338542 42718.479733854168
#> 9  2016-12-14 11:30:50 42718.4797453704 42718.479745370372
#> 10 2016-12-14 11:30:51 42718.4797569444 42718.479756944442
#> 11 2016-12-14 11:30:52 42718.4797684028 42718.479768402780
#>              rcpp_date                r_xl              r_xlsx
#> 1  2016-12-14 11:29:57 2016-12-14 11:29:57 2016-12-14 11:29:57
#> 2  2016-12-14 11:29:58 2016-12-14 11:29:58 2016-12-14 11:29:58
#> 3  2016-12-14 11:29:59 2016-12-14 11:29:59 2016-12-14 11:29:59
#> 4  2016-12-14 11:29:59 2016-12-14 11:30:00 2016-12-14 11:29:59
#> 5  2016-12-14 11:30:00 2016-12-14 11:30:00 2016-12-14 11:30:00
#> 6  2016-12-14 11:30:01 2016-12-14 11:30:01 2016-12-14 11:30:01
#> 7  2016-12-14 11:30:48 2016-12-14 11:30:48 2016-12-14 11:30:48
#> 8  2016-12-14 11:30:49 2016-12-14 11:30:49 2016-12-14 11:30:49
#> 9  2016-12-14 11:30:50 2016-12-14 11:30:50 2016-12-14 11:30:50
#> 10 2016-12-14 11:30:50 2016-12-14 11:30:50 2016-12-14 11:30:50
#> 11 2016-12-14 11:30:51 2016-12-14 11:30:51 2016-12-14 11:30:51
#>             chron_xlsx
#> 1  (12/16/86 11:29:57)
#> 2  (12/16/86 11:29:58)
#> 3  (12/16/86 11:29:59)
#> 4  (12/16/86 11:30:00)
#> 5  (12/16/86 11:30:01)
#> 6  (12/16/86 11:30:02)
#> 7  (12/16/86 11:30:48)
#> 8  (12/16/86 11:30:49)
#> 9  (12/16/86 11:30:50)
#> 10 (12/16/86 11:30:51)
#> 11 (12/16/86 11:30:52)

datetime.xlsx

@jennybc
Copy link
Member

jennybc commented Mar 11, 2017

Related SheetJS/ssf@c428205

Comments elsewhere re this fix to out.u is "Due to floating point issues, correct for sub seconds", "subseconds should be rounded up in this case".

var parse_date_code = function parse_date_code(v,opts) {
  var date = Math.floor(v), time = Math.floor(86400 * (v - date)+1e-6), dow=0;
  var dout=[], out={D:date, T:time, u:86400*(v-date)-time}; fixopts(opts = (opts||{}));
  if(opts.date1904) date += 1462;
  if(date > 2958465) return null;
  if(out.u > .999) {
    out.u = 0;
    ++time;
  }
...
}

@berkorbay
Copy link

berkorbay commented May 14, 2018

Sorry to reopen the case but I still get the error. Here is the file test_case.xlsx

What I see on Excel (format is DD/MM/YY HH:MM)
screenshot 2018-05-14 20 23 18

What I see on R
screenshot 2018-05-14 20 24 32

I tested and reproduced it on a Mac and a Linux machine. Version 1.1.0. I can provide more details if you like.

@jennybc
Copy link
Member

jennybc commented May 14, 2018

Will you open a new issue and, ideally, provide actually R code instead of screenshots? You're already pretty close to having an ideal reprex, since you're providing the xlsx.

How to provide a readxl reprex

We're in a much better position to address your issue if you can provide a reprex (reproducible example). Provide as much of this as you can:

  • An actual xls or xlsx file. Pick one:
    • Your personal xls or xlsx: try to strip it down to the minimal size and complexity to demonstrate your point. And, obviously, remove any sensitive data.
    • A publicly available xls or xlsx: provide URL and the code you used to download.
  • A small bit of R code that uses readxl on the provided xls or xlsx file and demonstrates your point.
    • Consider using the reprex package to prepare this. In addition to nice formatting, this ensures your reprex is self-contained.
  • Any details about your environment that seem clearly relevant, such as operating system.
    reprex(..., si = TRUE)
    will append a standard summary, folded neatly away, at the bottom of your reprex.

How to provide your own xls/xlsx file? In order of preference:

  1. Attach the file directly to your issue. Instructions are always at the bottom of the issue or comment box. .xlsx is a supported file type. You'll need to zip or gzip .xls so it appears as .zip or .gz.
  2. Share via DropBox or Google Drive and provide the link in your issue.
  3. Explain you absolutely cannot provide a relevant file via github.com and offer to provide privately.
  4. Don't share a file and realize you're hoping for, e.g., a bug fix with no concrete example to go on.

@lock lock bot locked and limited conversation to collaborators Oct 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior datetime 📆
Projects
None yet
Development

No branches or pull requests

5 participants