Skip to content

Multiple joins in one query #865

@mgirlich

Description

@mgirlich

In databases one often needs to join multiple tables. In dbplyr this produces rather many nested queries which one would more often write as a single query.

Example

library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)

lf <- lazy_frame(x = 1, a = 1)
lf2 <- lazy_frame(x = 1, b = 2)
lf3 <- lazy_frame(x = 1, c = 3)

left_join(lf, lf2, by = "x") %>% 
  left_join(lf3, by = "x")

Created on 2022-05-10 by the reprex package (v2.0.1)

Currently produces

SELECT `LHS`.`x` AS `x`, `a`, `b`, `c`
FROM (
  SELECT `LHS`.`x` AS `x`, `a`, `b`
  FROM `df1` AS `LHS`
  LEFT JOIN `df2` AS `RHS`
    ON (`LHS`.`x` = `RHS`.`x`)
) `LHS`
LEFT JOIN `df3` AS `RHS`
  ON (`LHS`.`x` = `RHS`.`x`)

It would be nicer if it could produce something like

SELECT `df`.`x` AS `x`, `a`, `b`, `c`
FROM `df1`
LEFT JOIN `df2`
  ON (`df1`.`x` = `df2`.`x`)
LEFT JOIN `df3`
  ON (`df1`.`x` = `df3`.`x`)

Thoughts & Questions

  • Is the result of joins in subqueries and multiple joins in one query necessarily the same?
  • What about table aliases?
    • Now we have x_as and y_as. I think if they are not provided it might make more sense to not use a table alias.
    • If x_as is provided in a join which is not the first join, then maybe a subquery should be generated
    • If y_as is provided it can always be used
  • A FULL JOIN can be tricky to combine with other joins:
    • SQLite does not directly support FULL JOIN
    • The columns joined by are coalesce()
  • If semi_join() or anti_join() is followed by left/right/inner/full_join() they cannot be combined because WHERE is evaluated after JOIN

So, I think that

  • a sequence of left_join() and inner_join() can be combined in one query
  • a sequence of semi_join() and anti_join() might be combined (though nested queries might be more efficient)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions