New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joins are too strict about tz attributes #2643

Closed
hadley opened this Issue Apr 13, 2017 · 3 comments

Comments

Projects
2 participants
@hadley
Copy link
Member

hadley commented Apr 13, 2017

I think this is going to cause major pain for people

library(dplyr, warn.conflicts = FALSE)
library(nycflights13)

hourly_delay <- flights %>% 
  group_by(time_hour, origin) %>% 
  summarise(delay = mean(dep_delay, na.rm = TRUE))
    
hourly_delay %>% 
  inner_join(weather, by = c("time_hour", "origin"))
#> Error in inner_join_impl(x, y, by$x, by$y, suffix$x, suffix$y, accept_na_match): attributes are different


attributes(flights$time_hour)
#> $class
#> [1] "POSIXct" "POSIXt" 
#> 
#> $tzone
#> [1] "UTC"
attributes(weather$time_hour)
#> $class
#> [1] "POSIXct" "POSIXt" 
#> 
#> $tzone
#> [1] ""

@zeehio would this be easy to fix?

@krlmlr could we get a better error message here? Ideally needs to give variable names on both sides of the join

@zeehio

This comment has been minimized.

Copy link
Contributor

zeehio commented Apr 13, 2017

The join logic goes through a different code path than the code paths I have worked with in the past (this is the third way I see to merge vectors in dplyr... I hope vctrs comes up soon!).

A quick fix can be done at (check_attribute_compatibility)[https://github.com/tidyverse/dplyr/blob/master/src/join.cpp#L44] so if both left and right inherit POSIXct and have the same time zone then the function returns without error. Consider "GMT", "UTC", and "" as equivalent. With that fix the POSIXct case should work fine. If nobody can push this patch I may do it, but I don't know when I'll have the time.

I believe that the best fix would be to make join_visitor use the Collecter.h logic, but that would be more work.

@krlmlr krlmlr added this to To-do in krlmlr Apr 13, 2017

@zeehio

This comment has been minimized.

Copy link
Contributor

zeehio commented Apr 13, 2017

Correction: ignore the "have the same time zone" constraint. If left and right inherit POSIXct then check_attribute_compatibility can return safely without error.

The timezone is just used for visualization, as far as I understand. So if someone joined different timezones the merge would be fine and the result would be printed in UTC (the default).

zeehio added a commit to zeehio/dplyr that referenced this issue Apr 16, 2017

@zeehio

This comment has been minimized.

Copy link
Contributor

zeehio commented Apr 16, 2017

I found the time to fix this. I added a test case. I did not work on the error messages though

@hadley hadley closed this in #2672 Apr 17, 2017

hadley added a commit that referenced this issue Apr 17, 2017

@krlmlr krlmlr moved this from To-do to Done in krlmlr Apr 17, 2017

@lock lock bot locked as resolved and limited conversation to collaborators Jun 8, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.