Skip to content

Left join columns order changed #139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
engineerchange opened this issue Dec 29, 2019 · 7 comments · Fixed by #209
Closed

Left join columns order changed #139

engineerchange opened this issue Dec 29, 2019 · 7 comments · Fixed by #209
Labels
feature a feature request or enhancement

Comments

@engineerchange
Copy link

Thank you for all the work put into dtplyr; it already is saving me loads of time in my day to day.

It appears that left_join causes columns to change order. Doesn't appear to be an issue with right_join or merge.

library(data.table)
library(dtplyr)
library(dplyr,warn.conflicts=FALSE)

packageVersion("data.table")
#> [1] '1.12.8'
packageVersion("dtplyr")
#> [1] '1.0.0.9000'

x1 <- data.frame(x = 1:3, y = 1)
x2 <- data.frame(x = 1:3, z = 2)

x1 %>%
  left_join(x2, by="x")
#>   x y z
#> 1 1 1 2
#> 2 2 1 2
#> 3 3 1 2

x1 %>%
  lazy_dt() %>%
  left_join(x2, by="x") %>%
  as_tibble()
#> # A tibble: 3 x 3
#>       x     z     y
#>   <int> <dbl> <dbl>
#> 1     1     2     1
#> 2     2     2     1
#> 3     3     2     1

Created on 2019-12-29 by the reprex package (v0.3.0)

@hadley
Copy link
Member

hadley commented Jan 25, 2021

Somewhat more minimal reprex:

library(dtplyr)
library(dplyr, warn.conflicts=FALSE)

x1 <- data.frame(x = 1:3, y = 1)
x2 <- data.frame(x = 1:3, z = 2)

x1 %>%
  left_join(x2, by="x") %>% 
  names()
#> [1] "x" "y" "z"

x1 %>%
  lazy_dt() %>%
  left_join(x2, by="x") %>%
  collect() %>% 
  names()
#> [1] "x" "z" "y"

Created on 2021-01-25 by the reprex package (v0.3.0.9001)

@hadley hadley added the bug an unexpected problem or unintended behavior label Jan 25, 2021
@hadley
Copy link
Member

hadley commented Jan 26, 2021

Hmmm, this is because of the way that data.table does joins, where the column names are the by vars, the y vars, then the x vars. Reordering the variables after the join would make the translation quite a bit more complicated, and I'm not sure if it's worth it.

@hadley
Copy link
Member

hadley commented Jan 26, 2021

I could just make this always return a merge() instead of trying to use the compact dt1[dt2] syntax. @MichaelChirico is this likely to have an performance implications?

@MichaelChirico
Copy link
Contributor

MichaelChirico commented Jan 26, 2021 via email

@hadley
Copy link
Member

hadley commented Jan 26, 2021

None of which have obvious dplyr translations, so I think it's probably fine to always use merge() in the interest of hewing closer to dplyr behaviour.

@MichaelChirico
Copy link
Contributor

I agree it makes sense.

fwiw I think the issue could also be fixed with a call to setcolorderv(), if an approach with [ were to be revisited

@hadley
Copy link
Member

hadley commented Jan 26, 2021

Hmmm, that might be easier — I think I have the needed info to only do that in collect(), if there's a mismatch between dplyr and data.table orderings.

@hadley hadley added feature a feature request or enhancement and removed bug an unexpected problem or unintended behavior labels Jan 30, 2021
hadley pushed a commit that referenced this issue Mar 3, 2021
And pull out `set_colorder()`

Fixes #139
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants