Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joins are too strict about tz attributes #2643

Closed
hadley opened this issue Apr 13, 2017 · 3 comments
Closed

Joins are too strict about tz attributes #2643

hadley opened this issue Apr 13, 2017 · 3 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@hadley
Copy link
Member

hadley commented Apr 13, 2017

I think this is going to cause major pain for people

library(dplyr, warn.conflicts = FALSE)
library(nycflights13)

hourly_delay <- flights %>% 
  group_by(time_hour, origin) %>% 
  summarise(delay = mean(dep_delay, na.rm = TRUE))
    
hourly_delay %>% 
  inner_join(weather, by = c("time_hour", "origin"))
#> Error in inner_join_impl(x, y, by$x, by$y, suffix$x, suffix$y, accept_na_match): attributes are different


attributes(flights$time_hour)
#> $class
#> [1] "POSIXct" "POSIXt" 
#> 
#> $tzone
#> [1] "UTC"
attributes(weather$time_hour)
#> $class
#> [1] "POSIXct" "POSIXt" 
#> 
#> $tzone
#> [1] ""

@zeehio would this be easy to fix?

@krlmlr could we get a better error message here? Ideally needs to give variable names on both sides of the join

@zeehio
Copy link
Contributor

zeehio commented Apr 13, 2017

The join logic goes through a different code path than the code paths I have worked with in the past (this is the third way I see to merge vectors in dplyr... I hope vctrs comes up soon!).

A quick fix can be done at (check_attribute_compatibility)[https://github.com/tidyverse/dplyr/blob/master/src/join.cpp#L44] so if both left and right inherit POSIXct and have the same time zone then the function returns without error. Consider "GMT", "UTC", and "" as equivalent. With that fix the POSIXct case should work fine. If nobody can push this patch I may do it, but I don't know when I'll have the time.

I believe that the best fix would be to make join_visitor use the Collecter.h logic, but that would be more work.

@zeehio
Copy link
Contributor

zeehio commented Apr 13, 2017

Correction: ignore the "have the same time zone" constraint. If left and right inherit POSIXct then check_attribute_compatibility can return safely without error.

The timezone is just used for visualization, as far as I understand. So if someone joined different timezones the merge would be fine and the result would be printed in UTC (the default).

@hadley hadley added bug an unexpected problem or unintended behavior data frame labels Apr 14, 2017
zeehio added a commit to zeehio/dplyr that referenced this issue Apr 16, 2017
@zeehio
Copy link
Contributor

zeehio commented Apr 16, 2017

I found the time to fix this. I added a test case. I did not work on the error messages though

@lock lock bot locked as resolved and limited conversation to collaborators Jun 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

2 participants