Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Join custom alias #637

Merged
merged 20 commits into from
Mar 1, 2022
Merged

Join custom alias #637

merged 20 commits into from
Mar 1, 2022

Conversation

mgirlich
Copy link
Collaborator

@mgirlich mgirlich commented Apr 13, 2021

  • add lhs_as and rhs_as to join-verbs
  • pass it along
  • wrapping x and y in subquery moves from sql_render() to sql_query_join() resp. sql_query_semi_join() because they gain lhs_as as argument

@mgirlich mgirlich requested a review from krlmlr July 30, 2021 11:57
@mgirlich mgirlich mentioned this pull request Aug 13, 2021
Copy link
Member

@krlmlr krlmlr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really love the idea of giving the LHS and RHS custom names. Two remarks:

  • The patch is relatively large, because the new _as arguments are passed along everywhere. Can we smuggle lhs_as and rhs_as somewhere in the existing arguments, e.g. in names(x) or by ?
  • Can we use a smart default for lhs_as and rhs_as, e.g. deparse(substitute(x))) ? This will be somewhat tricky with the generic but might be worthwhile.

Conflicts:
	R/backend-oracle.R
	tests/testthat/_snaps/backend-oracle.md
@mgirlich
Copy link
Collaborator Author

  • The patch is relatively large, because the new _as arguments are passed along everywhere. Can we smuggle lhs_as and rhs_as somewhere in the existing arguments, e.g. in names(x) or by ?

names(x) feels quite indirect and not that safe so I've gone via by.

  • Can we use a smart default for lhs_as and rhs_as, e.g. deparse(substitute(x))) ? This will be somewhat tricky with the generic but might be worthwhile.

Hmm, I think this only makes sense if x or y is a symbol.

@mgirlich mgirlich mentioned this pull request Nov 24, 2021
@hadley
Copy link
Member

hadley commented Dec 9, 2021

What's the advantage of this? Just easier to read SQL?

@mgirlich
Copy link
Collaborator Author

mgirlich commented Dec 9, 2021

Yes, I think that is the only advantage. But I think that can help readability a lot.

R/backend-sqlite.R Outdated Show resolved Hide resolved
R/verb-joins.R Outdated
@@ -68,7 +70,8 @@ NULL
inner_join.tbl_lazy <- function(x, y, by = NULL, copy = FALSE,
suffix = NULL,
auto_index = FALSE, ...,
sql_on = NULL, na_matches = c("never", "na")) {
sql_on = NULL, na_matches = c("never", "na"),
lhs_as = "LHS", rhs_as = "RHS") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use x_as and y_as for consistency with first arguments? Or would it be better to have as = c("LHS", "RHS")? Or as_x and as_y?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having only one argument saves a bit of typing but I prefer having two arguments. To me x_as and y_as read the most natural.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have mild preference towards making the common part of a name the prefix (since it helps with autocomplete), but that doesn't make much difference here. If you prefer x_as and y_as then we can stick with that.

@@ -22,10 +22,10 @@
LEFT JOIN `df` AS `RHS`
ON (`LHS`.`x` = `RHS`.`x`)
UNION
SELECT COALESCE(`LHS`.`x`, `RHS`.`x`) AS `x`, `LHS`.`y` AS `y.x`, `RHS`.`y` AS `y.y`, `z`
SELECT COALESCE(`RHS`.`x`, `LHS`.`x`) AS `x`, `LHS`.`y` AS `y.x`, `RHS`.`y` AS `y.y`, `z`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have changed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not really matter because a) they are equal (see the ON) and b) the COALESCE() is not really needed here but rather an artifact from simulating the full_join(). As the result should be correct it should not matter that the query is a bit more complicated than necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just slightly worried that this indicates we're now preferring the RHS of the join over the LHS in full joins?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I understand your concern but this was also the case before but just hidden because lazy frames are always named df. Before the second join query was actually

  SELECT
    COALESCE(`LHS`.`x`, `RHS`.`x`) AS `x`,
    `LHS`.`y` AS `y.x`,
    `z1`,
    `RHS`.`y` AS `y.y`,
    `z2`
  FROM `df2` AS `LHS`
  LEFT JOIN `df1` AS `RHS`
    ON (`LHS`.`x` = `RHS`.`x`)

So, df2 was LHS instead of RHS. Now this is

SELECT
    COALESCE(`RHS`.`x`, `LHS`.`x`) AS `x`,
    `LHS`.`y` AS `y.x`,
    `z1`,
    `RHS`.`y` AS `y.y`,
    `z2`
  FROM `df2` AS `RHS`
  LEFT JOIN `df1` AS `LHS`
    ON (`RHS`.`x` = `LHS`.`x`)

Changing the COALESCE would basically require an extra argument to sql_join_vars() or writing a custom left join query method for SQLite. I don't think this is worth it...

@mgirlich mgirlich merged commit 65d6b7e into main Mar 1, 2022
@mgirlich mgirlich deleted the join-custom-alias branch March 1, 2022 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants