New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POSIXct joins give weird results #819

Closed
nirski opened this Issue Dec 1, 2014 · 5 comments

Comments

Projects
None yet
3 participants
@nirski

nirski commented Dec 1, 2014

I'm not sure if this touches dplyr, or lubridate, or both.

> date.string <- "1915-01-01 08:00"

> date.lubridate <- ymd_hm(date.string)
> (date.lubridate <- seq(date.lubridate, length = 3, by = "5 y"))
[1] "1915-01-01 08:00:00 UTC" "1920-01-01 08:00:00 UTC" "1925-01-01 08:00:00 UTC"

> date.posixct <- as.POSIXct(strptime(date.string, "%Y-%m-%d %H:%M"))
> (date.posixct <- seq(date.posixct, length = 3, by = "5 y"))
[1] "1915-01-01 08:00:00 WMT" "1920-01-01 08:00:00 EET" "1925-01-01 08:00:00 CET"

Time zones were in their infancy those years, but they are well defined. Lubridate and base methods have different defaults, and this is fine.

> (c <- data_frame(date.lubridate, date.posixct))
Source: local data frame [3 x 2]

       date.lubridate        date.posixct
1 1915-01-01 08:00:00 1915-01-01 08:00:00
2 1920-01-01 08:00:00 1920-01-01 08:00:00
3 1925-01-01 08:00:00 1925-01-01 08:00:00

In a data frame the time zone is not obvious. But when we try to join...

> c %>% left_join(c)
Joining by: c("date.lubridate", "date.posixct")
Source: local data frame [3 x 2]

       date.lubridate        date.posixct
1 1915-01-01 09:24:00 1915-01-01 08:00:00
2 1920-01-01 10:00:00 1920-01-01 08:00:00
3 1925-01-01 09:00:00 1925-01-01 08:00:00

The first column goes awry silently. If a data frame is joined with itself, it should be indifferent to the timezone used.

> session_info()
Session info------------------------------------------------------------------------------
 setting  value                                      
 version  R version 3.1.2 Patched (2014-11-29 r67076)
 system   x86_64, mingw32                            
 ui       RStudio (0.99.81)                          
 language (EN)                                       
 collate  Polish_Poland.1250                         
 tz       Europe/Warsaw                              

Packages----------------------------------------------------------------------------------
 package    * version    date       source                           
 dplyr      * 0.3.0.9000 2014-11-25 Github (hadley/dplyr@592e905)    
 lubridate  * 1.3.3      2014-11-25 Github (hadley/lubridate@0a20213)

@hadley hadley added this to the 0.4 milestone Dec 2, 2014

@hadley hadley added the feature label Dec 2, 2014

@hadley

This comment has been minimized.

Member

hadley commented Dec 2, 2014

Here's a simpler example:

date1 <- structure(-1735660800, tzone = "America/Chicago", class = c("POSIXct", "POSIXt"))
date2 <- structure(-1735660800, tzone = "UTC", class = c("POSIXct", "POSIXt"))

df <- data_frame(cst = date1, utc = date2)
left_join(df, df)
@romainfrancois

This comment has been minimized.

Member

romainfrancois commented Dec 3, 2014

I get:

> df %>% lapply(attributes)
$cst
$cst$tzone
[1] "America/Chicago"

$cst$class
[1] "POSIXct" "POSIXt"


$utc
$utc$tzone
[1] "UTC"

$utc$class
[1] "POSIXct" "POSIXt"

> left_join(df, df) %>% lapply(attributes)
Joining by: c("cst", "utc")
$cst
$cst$class
[1] "POSIXct" "POSIXt"


$utc
$utc$class
[1] "POSIXct" "POSIXt"

So the times are correct and we lose the time zone information.

@hadley

This comment has been minimized.

Member

hadley commented Dec 3, 2014

Ideally we should preserve them, using a similar strategy to factors:

  • If all tz's are the same, use that in the output
  • If they are different, default to "UTC" tz (not the local tz)

We can work on this in 0.4 we tackle this for all vector types and all combination methods (rbind, joins, combine etc)

@romainfrancois

This comment has been minimized.

Member

romainfrancois commented Dec 3, 2014

Ah. I had a fix that was just copying attributes from the left one, e.g. :

       inline SEXP promote( Vec vec){
            // vec.attr( "class" ) = JoinVisitorImpl::left.attr( "class" );
            copy_most_attributes(vec,JoinVisitorImpl::left ) ; 
            return vec ;
        }

I'll push this now, but we can think about it again with 0.4.

romainfrancois added a commit that referenced this issue Dec 3, 2014

Join functions keep attributes (e.g. time zone information) from the
  left argument for `POSIXct` and `Date` objects (#819).

@romainfrancois romainfrancois self-assigned this Aug 23, 2015

romainfrancois added a commit that referenced this issue Aug 23, 2015

@hadley

This comment has been minimized.

Member

hadley commented Aug 24, 2015

Looks good now.

@hadley hadley closed this Aug 24, 2015

@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.