New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase precision of xls datetimes when coerced to character #431

Merged
merged 5 commits into from Mar 16, 2018

Conversation

Projects
None yet
2 participants
@jennybc
Member

jennybc commented Mar 16, 2018

Fixes #430 read_xls rounding date times when col_types="text"

@jennybc jennybc requested a review from jimhester Mar 16, 2018

@jennybc

This comment has been minimized.

Member

jennybc commented Mar 16, 2018

@jimhester This "fixes" the problem but I am not entirely satisfied because

  • I don't have a good handle on why the precision was previously so low (?) on the xls side, nor how it's currently set on the xlsx side.
  • I feel uncertain about how many digits of agreement I should expect between xlsx and xls in the test.

General context, in case it's not clear: until the day when we can translate xls(x) date time formats to R date time formats, datetimes will be coerced to character as if they were just regular doubles.

Do you have any advice re: setting precision and/or testing?


devtools::load_all(here::here())
#> Loading readxl
xlsx <- read_excel(test_sheet("texty-dates-xlsx.xlsx"), col_types = "text")
xls <- read_excel(test_sheet("texty-dates-xls.xls"), col_types = "text")

xlsx
#> # A tibble: 2 x 1
#>   a                 
#>   <chr>             
#> 1 31117.541666666672
#> 2 31117.558009259261
xls
#> # A tibble: 2 x 1
#>   a            
#>   <chr>        
#> 1 31117.5416667
#> 2 31117.5580093

Created on 2018-03-15 by the reprex package (v0.2.0).

@jimhester

Why can't we do this now?

translate xls(x) date time formats to R date time formats

@@ -247,7 +247,7 @@ class XlsCell {
if (std::modf(cell_->d, &intpart) == 0.0) {
strs << std::fixed << (int64_t)cell_->d;
} else {
strs << cell_->d;
strs << std::setprecision(12) << cell_->d;

This comment has been minimized.

@jimhester

jimhester Mar 16, 2018

Member

This needs to be something like std::numeric_limits<double>::digits10 + 2 (which ends up being 17 for 64bit doubles) to ensure full precision. See the proposal linked at https://stackoverflow.com/questions/554063/how-do-i-print-a-double-value-with-full-precision-using-cout#comment29144568_554134 for details why.

This comment has been minimized.

@jennybc

jennybc Mar 16, 2018

Member

OK thanks. Do you think I should pre-emptively do same on the xlsx side, so they are both doing the "right" thing and doing the same thing? Disregard: just remembered that there is no coercion on the xlsx side -- the serial date is already a string.

Re: converting time format strings, our impression is that this is doable but it's also not a tiny piece of work. The plan is to use or build on this: https://github.com/WizardMac/TimeFormatStrings, which would have application across multiple packages.

This comment has been minimized.

@jimhester

jimhester Mar 16, 2018

Member

Alright, I apparently misunderstood how the dates are represented in xls format, LGTM!

@jennybc jennybc merged commit 4af8b31 into tidyverse:master Mar 16, 2018

4 checks passed

codecov/patch 100% of diff hit (target 91.56%)
Details
codecov/project 91.56% (+0%) compared to ed499fb
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@jennybc jennybc deleted the jennybc:bugfix-430-xls-precision branch Mar 16, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment