Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic joins #146

Merged
merged 1 commit into from
Feb 9, 2019
Merged

Generic joins #146

merged 1 commit into from
Feb 9, 2019

Conversation

krlmlr
Copy link
Member

@krlmlr krlmlr commented Aug 23, 2018

Proposing a syntax to pass arbitrary join predicates, e.g. for working with spatial data on the database. Needs a test.

library(tidyverse)
library(dbplyr)
#> 
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#> 
#>     ident, sql

tbl1 <- memdb_frame(a = 1:3, b = 4:2)
tbl2 <- memdb_frame(c = 1:3, b = 2:0)

tbl12 <- left_join(tbl1, tbl2, by = sql("TBL_LEFT.b < TBL_RIGHT.c"))
tbl12 %>% sql_render()
#> <SQL> SELECT `TBL_LEFT`.`a` AS `a`, `TBL_LEFT`.`b` AS `b.x`, `TBL_RIGHT`.`c` AS `c`, `TBL_RIGHT`.`b` AS `b.y`
#>   FROM `ygyjgeeqdc` AS `TBL_LEFT`
#>   LEFT JOIN `urhwkxzyib` AS `TBL_RIGHT`
#>   ON (TBL_LEFT.b < TBL_RIGHT.c)
tbl12
#> # Source:   lazy query [?? x 4]
#> # Database: sqlite 3.22.0 [:memory:]
#>       a   b.x     c   b.y
#>   <int> <int> <int> <int>
#> 1     1     4    NA    NA
#> 2     2     3    NA    NA
#> 3     3     2     3     0

Created on 2018-08-23 by the reprex package (v0.2.0).

@krlmlr
Copy link
Member Author

krlmlr commented Aug 31, 2018

@hadley @edgararuiz: Do we want to support joins with custom join predicates? Another way to achieve this is a cross join with a filter, but

  • it would work better if we had a cross_join() verb (full_join() with empty by doesn't work in edge cases where column names overlap)
  • perhaps some databases perform better with JOIN ... ON than with a filter on a cross join in a subquery

@krlmlr
Copy link
Member Author

krlmlr commented Oct 22, 2018

left_join(tbl1, tbl2, on = sql("TBL_LEFT.b < TBL_RIGHT.c"))

?

@krlmlr
Copy link
Member Author

krlmlr commented Oct 22, 2018

left_join(tbl1, tbl2, sql_on = "TBL_LEFT.b < TBL_RIGHT.c")

@hadley
Copy link
Member

hadley commented Oct 22, 2018

Yeah, I like that as a temporary stop gap.

@krlmlr krlmlr changed the title WIP: Generic joins Generic joins Nov 5, 2018
@krlmlr
Copy link
Member Author

krlmlr commented Nov 5, 2018

A proper on argument that would take an expression would be more in line with the rest of the interface, but is not as useful for spatial joins due to #181.

I wonder if we should rename the aliases to x and y, or if this would cause too much trouble.

@hadley
Copy link
Member

hadley commented Jan 2, 2019

I don't think we need any extra work on this interface, as it's mostly meant as a stop gap until we come up with a general interface. But could you please add a couple of very simple tests of the SQL generation? (Just to make sure everything is plumbed together correctly)

@krlmlr
Copy link
Member Author

krlmlr commented Feb 2, 2019

Rebased on current master, added a few integration tests. Do we still need tests for the SQL generation?

@hadley
Copy link
Member

hadley commented Feb 5, 2019

I think it might be better to turn the test into a regression test on the generated SQL? (i.e. using expect_known_output())

@krlmlr
Copy link
Member Author

krlmlr commented Feb 9, 2019

Done, also adapted to the new naming.

@hadley hadley merged commit 4b5aab9 into tidyverse:master Feb 9, 2019
@hadley
Copy link
Member

hadley commented Feb 9, 2019

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants