Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POSIXct joins give weird results #819

Closed
nirski opened this issue Dec 1, 2014 · 5 comments
Closed

POSIXct joins give weird results #819

nirski opened this issue Dec 1, 2014 · 5 comments
Assignees
Labels
feature a feature request or enhancement
Milestone

Comments

@nirski
Copy link

nirski commented Dec 1, 2014

I'm not sure if this touches dplyr, or lubridate, or both.

> date.string <- "1915-01-01 08:00"

> date.lubridate <- ymd_hm(date.string)
> (date.lubridate <- seq(date.lubridate, length = 3, by = "5 y"))
[1] "1915-01-01 08:00:00 UTC" "1920-01-01 08:00:00 UTC" "1925-01-01 08:00:00 UTC"

> date.posixct <- as.POSIXct(strptime(date.string, "%Y-%m-%d %H:%M"))
> (date.posixct <- seq(date.posixct, length = 3, by = "5 y"))
[1] "1915-01-01 08:00:00 WMT" "1920-01-01 08:00:00 EET" "1925-01-01 08:00:00 CET"

Time zones were in their infancy those years, but they are well defined. Lubridate and base methods have different defaults, and this is fine.

> (c <- data_frame(date.lubridate, date.posixct))
Source: local data frame [3 x 2]

       date.lubridate        date.posixct
1 1915-01-01 08:00:00 1915-01-01 08:00:00
2 1920-01-01 08:00:00 1920-01-01 08:00:00
3 1925-01-01 08:00:00 1925-01-01 08:00:00

In a data frame the time zone is not obvious. But when we try to join...

> c %>% left_join(c)
Joining by: c("date.lubridate", "date.posixct")
Source: local data frame [3 x 2]

       date.lubridate        date.posixct
1 1915-01-01 09:24:00 1915-01-01 08:00:00
2 1920-01-01 10:00:00 1920-01-01 08:00:00
3 1925-01-01 09:00:00 1925-01-01 08:00:00

The first column goes awry silently. If a data frame is joined with itself, it should be indifferent to the timezone used.

> session_info()
Session info------------------------------------------------------------------------------
 setting  value                                      
 version  R version 3.1.2 Patched (2014-11-29 r67076)
 system   x86_64, mingw32                            
 ui       RStudio (0.99.81)                          
 language (EN)                                       
 collate  Polish_Poland.1250                         
 tz       Europe/Warsaw                              

Packages----------------------------------------------------------------------------------
 package    * version    date       source                           
 dplyr      * 0.3.0.9000 2014-11-25 Github (hadley/dplyr@592e905)    
 lubridate  * 1.3.3      2014-11-25 Github (hadley/lubridate@0a20213)
@hadley hadley added this to the 0.4 milestone Dec 2, 2014
@hadley hadley added the feature a feature request or enhancement label Dec 2, 2014
@hadley
Copy link
Member

hadley commented Dec 2, 2014

Here's a simpler example:

date1 <- structure(-1735660800, tzone = "America/Chicago", class = c("POSIXct", "POSIXt"))
date2 <- structure(-1735660800, tzone = "UTC", class = c("POSIXct", "POSIXt"))

df <- data_frame(cst = date1, utc = date2)
left_join(df, df)

@romainfrancois
Copy link
Member

I get:

> df %>% lapply(attributes)
$cst
$cst$tzone
[1] "America/Chicago"

$cst$class
[1] "POSIXct" "POSIXt"


$utc
$utc$tzone
[1] "UTC"

$utc$class
[1] "POSIXct" "POSIXt"

> left_join(df, df) %>% lapply(attributes)
Joining by: c("cst", "utc")
$cst
$cst$class
[1] "POSIXct" "POSIXt"


$utc
$utc$class
[1] "POSIXct" "POSIXt"

So the times are correct and we lose the time zone information.

@hadley
Copy link
Member

hadley commented Dec 3, 2014

Ideally we should preserve them, using a similar strategy to factors:

  • If all tz's are the same, use that in the output
  • If they are different, default to "UTC" tz (not the local tz)

We can work on this in 0.4 we tackle this for all vector types and all combination methods (rbind, joins, combine etc)

@romainfrancois
Copy link
Member

Ah. I had a fix that was just copying attributes from the left one, e.g. :

       inline SEXP promote( Vec vec){
            // vec.attr( "class" ) = JoinVisitorImpl::left.attr( "class" );
            copy_most_attributes(vec,JoinVisitorImpl::left ) ; 
            return vec ;
        }

I'll push this now, but we can think about it again with 0.4.

romainfrancois added a commit that referenced this issue Dec 3, 2014
  left argument for `POSIXct` and `Date` objects (#819).
@romainfrancois romainfrancois self-assigned this Aug 23, 2015
@hadley
Copy link
Member

hadley commented Aug 24, 2015

Looks good now.

@hadley hadley closed this as completed Aug 24, 2015
@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants