Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple joins in one query #865

Closed
mgirlich opened this issue May 10, 2022 · 4 comments · Fixed by #984
Closed

Multiple joins in one query #865

mgirlich opened this issue May 10, 2022 · 4 comments · Fixed by #984

Comments

@mgirlich
Copy link
Collaborator

In databases one often needs to join multiple tables. In dbplyr this produces rather many nested queries which one would more often write as a single query.

Example

library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)

lf <- lazy_frame(x = 1, a = 1)
lf2 <- lazy_frame(x = 1, b = 2)
lf3 <- lazy_frame(x = 1, c = 3)

left_join(lf, lf2, by = "x") %>% 
  left_join(lf3, by = "x")

Created on 2022-05-10 by the reprex package (v2.0.1)

Currently produces

SELECT `LHS`.`x` AS `x`, `a`, `b`, `c`
FROM (
  SELECT `LHS`.`x` AS `x`, `a`, `b`
  FROM `df1` AS `LHS`
  LEFT JOIN `df2` AS `RHS`
    ON (`LHS`.`x` = `RHS`.`x`)
) `LHS`
LEFT JOIN `df3` AS `RHS`
  ON (`LHS`.`x` = `RHS`.`x`)

It would be nicer if it could produce something like

SELECT `df`.`x` AS `x`, `a`, `b`, `c`
FROM `df1`
LEFT JOIN `df2`
  ON (`df1`.`x` = `df2`.`x`)
LEFT JOIN `df3`
  ON (`df1`.`x` = `df3`.`x`)

Thoughts & Questions

  • Is the result of joins in subqueries and multiple joins in one query necessarily the same?
  • What about table aliases?
    • Now we have x_as and y_as. I think if they are not provided it might make more sense to not use a table alias.
    • If x_as is provided in a join which is not the first join, then maybe a subquery should be generated
    • If y_as is provided it can always be used
  • A FULL JOIN can be tricky to combine with other joins:
    • SQLite does not directly support FULL JOIN
    • The columns joined by are coalesce()
  • If semi_join() or anti_join() is followed by left/right/inner/full_join() they cannot be combined because WHERE is evaluated after JOIN

So, I think that

  • a sequence of left_join() and inner_join() can be combined in one query
  • a sequence of semi_join() and anti_join() might be combined (though nested queries might be more efficient)
@mgirlich mgirlich added this to the 2.3.0 milestone May 10, 2022
@hadley
Copy link
Member

hadley commented May 20, 2022

As an initial step, maybe avoid aliases if possible e.g.

SELECT `LHS`.`x` AS `x`, `a`, `b`
FROM `df1` AS `LHS`
LEFT JOIN `df2` AS `RHS`
    ON (`LHS`.`x` = `RHS`.`x`)

could be

SELECT `df1`.`x` AS `x`, `a`, `b`
FROM `df1`
LEFT JOIN `df2` ON (`df1`.`x` = `df2`.`x`)

It's probably much less important to avoid nested queries for semi/anti joins, since I think it's much rarer to use multiple in one query.

@mgirlich
Copy link
Collaborator Author

As an initial step, maybe avoid aliases if possible e.g.

Good idea, this is done in PR #892.

@hadley
Copy link
Member

hadley commented May 31, 2022

Another random thought: when we have to use aliases, it'd be cool to use abbreviate() to generate them, since it has a pretty nice algorithm. (Maybe replace _ with " " before calling abbreviate() to better handle snake case). The main downside is that we'd need to maintain a list of all tables to make sure we didn't accidentally generate a duplicate.

@mgirlich
Copy link
Collaborator Author

mgirlich commented Jun 1, 2022

Another random thought: when we have to use aliases, it'd be cool to use abbreviate() to generate them, since it has a pretty nice algorithm. (Maybe replace _ with " " before calling abbreviate() to better handle snake case).

Nice, I like that!

The main downside is that we'd need to maintain a list of all tables to make sure we didn't accidentally generate a duplicate.

For allowing multiple queries we kind of need that anyway, so this is not a big deal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants