New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fast_strptime POSIX default incompatible with dplyr #403

Closed
skrackow opened this Issue Apr 18, 2016 · 14 comments

Comments

Projects
None yet
5 participants
@skrackow

skrackow commented Apr 18, 2016

Users applying lubridate as well as dplyr ran into troubles after fast_strptime default output class was change from POSIXct to POSIXlt, as highly useful functions like left_join or bind_rows do no longer work, as dplyr cannot handle POSIXlt. A global option to change fast_strptime's lt option would be highly appreciated! Thanks, SVEN.

@vspinu

This comment has been minimized.

Member

vspinu commented Apr 19, 2016

POSIXlt is for compatibility with strptime and is faster because of now internal conversion.

You can pass lt=F to fast_strptime or just use parse_date_time2 which returns POSIXct by default and works with strptfime style formats just fine.

If any of the above are "inconvenient", you can overwrite the function with your own in your .Rprofile. I am quite reluctant to add a global option for these high performance functions.

@skrackow

This comment has been minimized.

skrackow commented Apr 19, 2016

Thanks very much for your reply!
I was misled in assuming that "Hadley" package compatibility had priority.
If fact, makes me wonder why dplyr functions do not convert lt to ct if
encountered, as other coercions are made without much ado, like factor to
character if levels differ...

POSIXlt is for compatibility with strptime and is faster because of now
internal conversion.

You can pass lt=F to fast_strptime or just use parse_date_time2 which
returns POSIXct by default and works with strptfime style formats just fine.

Sure. Did it.

If any of the above are "inconvenient", you can overwrite the function
with your own in your .Rprofile. I am quite reluctant to add a global
option for these high performance functions.

As sort of "R scriptor" at the department, that is of course not an option:)
Thanks again,
SVEN.

Dr. Sven Krackow
Univ. Zürich, Anatomie

@vspinu

This comment has been minimized.

Member

vspinu commented Apr 19, 2016

If fact, makes me wonder why dplyr functions do not convert lt to ct if encountered, as other coercions are made without much ado, like factor to character if levels differ..

Good question. I think you should bring it to @hadley's attention in a separate dplyr thread.

@hadley

This comment has been minimized.

Member

hadley commented Apr 19, 2016

I strongly believe that no function should generate POSIXlt.

@vspinu

This comment has been minimized.

Member

vspinu commented Apr 19, 2016

stptime and fast_strptime do it for efficiency reasons. It avoids the overhead of double conversion (to POSIXct internally, and to POSIXlt by the final consumer). The consumer might want to use only some components of POSIXlt (year, month etc). In any case, whatever is the source of POSIXlt, I think dplyr should handle it gracefully.

@cderv

This comment has been minimized.

Contributor

cderv commented Apr 19, 2016

I intended to open a new issue but I see this very recent discussion in relation with POSIXlt and POSIXct ant changement in behaviour between 1.5.0 and 1.5.6
In fact, now that default behaviour as changed for dmy functions (in #391), functions that update date-time components generate a POSIXlt and not a POSIXct, when updating a Date. Code that worked with dplyr before are no more working since POSIXct became POSIXlt.

For example

library(lubridate)
x <- dmy("21-02-2010")
class(x)
class(update(x, hour = 20))
hour(x) <- 20
class(x)
  • with 1.5.0
library(lubridate)
x <- dmy("21-02-2010")
class(x)
#> [1] "POSIXct" "POSIXt" 
class(update(x, hour = 20))
#> [1] "POSIXct" "POSIXt" 
hour(x) <- 20
class(x)
#> [1] "POSIXct" "POSIXt" 
  • with 1.5.6
library(lubridate)
x <- dmy("21-02-2010")
class(x)
#> [1] "Date"
class(update(x, hour = 20))
#> [1] "POSIXlt" "POSIXt"
hour(x) <- 20
class(x)
#> [1] "POSIXlt" "POSIXt"

Because of this change, a use case to create datetime from 2 column of a data.frame is not working anymore with dplyr.

  • with 1.5.6
library(dplyr)
library(lubridate) # 1.5.6
DF <- data_frame(date = c("21-10-2012", "10-07-2015"), hour = c("5","20"))
DF %>%
  mutate(date = dmy(date)) %>%
  mutate(dtime = update(date, hour = hour))
#> Error: `mutate` does not support `POSIXlt` results
  • whereas with 1.5.0
library(dplyr)
library(lubridate)

DF <- data_frame(date = c("21-10-2012", "10-07-2015"), hour = c("5","20"))

DF %>%
  mutate(date = dmy(date)) %>%
  mutate(dtime = update(date, hour = hour))
#> Source: local data frame [2 x 3]
#> 
#>         date  hour               dtime
#>       (time) (chr)              (time)
#> 1 2012-10-21     5 2012-10-21 05:00:00
#> 2 2015-07-10    20 2015-07-10 20:00:00

If I change all my code, I could find the old behaviour with tz different to NULL

library(dplyr)
library(lubridate)
DF <- data_frame(date = c("21-10-2012", "10-07-2015"), hour = c("5","20"))
DF %>%
  mutate(date = dmy(date, tz = "UTC")) %>%
  mutate(dtime = update(date, hour = hour))
#> Source: local data frame [2 x 3]
#> 
#>         date  hour               dtime
#>       (time) (chr)              (time)
#> 1 2012-10-21     5 2012-10-21 05:00:00
#> 2 2015-07-10    20 2015-07-10 20:00:00 

I do not know if it should be considered as a problem or not. This issues between POSIXct, POSIXlt and Date sure break some previous code and cause some incompatibility between lubridate and dplyr - at least some checks to be sure to use POSIXct and not POSIXlt.

@hadley

This comment has been minimized.

Member

hadley commented Apr 19, 2016

I believe very strongly that the POSIXlt data structure is an internal implementation detail, and should not be exposed to the user. The data structure (a list of length 9 pretending to be a vector of length n) is fragile, and breaks in a wide variety of cases. Adding dplyr support for POSIXlt is not on the cards.

I would rather we look towards a hybrid data structure, i.e. something that is POSIXct at the core, but uses attributes to store parsed date time components. Then lubridate could use those attributes if available, or could otherwise compute from scratch.

@vspinu

This comment has been minimized.

Member

vspinu commented Apr 19, 2016

Adding dplyr support for POSIXlt is not on the cards.

Cannot you simply convert it to POSIXct on assignment?

I would rather we look towards a hybrid data structure, i.e. something that is POSIXct at the core, but uses attributes to store parsed date time components.

Yet another data structure for date-times? Sound like a complication.

Lubridate should produce lt objects only when requested or input is already an lt. The fact that update(date, hour = hour) produce an lt object is a concern. We can change that, but I wonder if there was a special reason to update date into lt in the first place.

@hadley

This comment has been minimized.

Member

hadley commented Apr 19, 2016

I'd rather people never produce POSIXlt in the first place, rather than trying to patch things up after the fact.

@cderv

This comment has been minimized.

Contributor

cderv commented Apr 19, 2016

Lubridate should produce lt objects only when requested or input is already an lt. The fact that update(date, hour = hour) produce an lt object is a concern.

update.Date is define to convert object argument to POSIXlt, then it calls update.POSIXt with a POSIXlt object, and this method give a results in the same class so POSIXlt.

update.Date <- function(object, ...){
  lt <- as.POSIXlt(object, tz = "UTC")
  new <- update(lt, ...)
  if (sum(c(new$hour, new$min, new$sec), na.rm = TRUE)) {
    new
  } else {
    as.Date(new)
  }
}

as.POSIXct could be added in the result to insure the class

update.Date <- function(object, ...){
  lt <- as.POSIXlt(object, tz = "UTC")
  new <- update(lt, ...)
  if (sum(c(new$hour, new$min, new$sec), na.rm = TRUE)) {
    as.POSIXct(new)
  } else {
    as.Date(new)
  }
}

We can change that, but I wonder if there was a special reason to update date into lt in the first place.

I wonder it too know but it is a convenient way to create datetime from date and times columns. Other ways is to use truncated arguments of lubridate's functions or to concatenate in one strings date and times to read them with dmy_hms functions (and others). I find update pretty convenient and as it allows to update hour, minutes and seconds from a Date object, I thought it could be use that way.

@vspinu vspinu closed this in 4ef4a03 Apr 19, 2016

@vspinu

This comment has been minimized.

Member

vspinu commented Apr 19, 2016

I have changed the update on Date to return POSIXct. Surprisingly, that didn't break any tests.

As already said, changing default output of fast_strptime is out of question for consistency and efficiency reasons. Use lt=F, or parse_date_time[2] or high level ymd[_hms] family.

@vspinu

This comment has been minimized.

Member

vspinu commented Apr 19, 2016

Other ways is to use truncated arguments

BTW, you can still use ymd to produce POSIXct instead of Date by supplying tz argumetn, like ymd(.., tz="UTC"). This should be rarely needed though.

@ldecicco-USGS

This comment has been minimized.

ldecicco-USGS commented Jun 14, 2016

I just want to add that the change in behavior for fast_strptime has cause much headache for me (where older versions of lubridate defaulted to POSIXct, and newer versions default to POSIXlt). If I add in the argument lt=TRUE, users with older versions of lubridate get the error "unused argument (lt = FALSE)". Even if I use:

lubridate (>= 1.5.0),

in the DESCRIPTION, some user only update specific packages (why? I have no idea, it's so easy to just update all....), and R doesn't automatically update versions of dependent packages as far as I can tell. I basically need to check what version of lubridate the user has within my package's function, then call fast_strptime with or without the argument.

Also, for what it's worth, that is still vastly superior than using base R!

@vspinu

This comment has been minimized.

Member

vspinu commented Jun 14, 2016

I basically need to check what version of lubridate

You can use parse_date_time2 or explicitly convert as.POSIXct(fast_strptime(...)) to avoid checking for the version.

Sorry for the trouble. This lt/ct change (here and in ymd) turned to be a much bigger headache than anyone anticipated, but it's the only right decision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment