Skip to content

Implement cross_join()#6612

Merged
DavisVaughan merged 3 commits intotidyverse:mainfrom
DavisVaughan:feature/cross-join
Dec 17, 2022
Merged

Implement cross_join()#6612
DavisVaughan merged 3 commits intotidyverse:mainfrom
DavisVaughan:feature/cross-join

Conversation

@DavisVaughan
Copy link
Member

@DavisVaughan DavisVaughan commented Dec 16, 2022

Closes #6604

This PR implements cross_join() as an alternative to the by = character() join syntax. It also soft deprecates by = character(). Further rationale for cross_join() is provided in #6604 (comment).

Comment on lines -246 to -253
# Cross join for empty `join_by()` calls
cross <- n == 0L

new_join_by(
exprs = exprs,
condition = condition,
filter = filter,
cross = cross,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Felt "right" to be able to remove cross join handling from join_by()

Comment on lines -274 to +278
new_join_by <- function(exprs, condition, filter, cross, x, y) {
new_join_by <- function(exprs = list(),
condition = character(),
filter = character(),
x = character(),
y = character()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users can never create empty join_by() calls, but we can internally make an empty new_join_by() object, which is useful for cross_join() and our backwards compatible support for by = character()

Comment on lines +331 to +336
if (!is_character(x_names)) {
abort("`by$x` must evaluate to a character vector.")
}
if (!is_character(y_names)) {
abort("`by$y` must evaluate to a character vector.")
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some more error handling for some obscure support for by = list(x = , y = ), which I don't think many people use but it was causing a weird error without this

Comment on lines +87 to +88
x_out <- vec_rep_each(x_out, times = y_size)
y_out <- vec_rep(y_out, times = x_size)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be faster than the previous approach too. No need to actually "locate" the matches using vec_locate_matches().

y_unmatched <- unmatched$y

if (cross) {
# TODO: Remove this section when `by = character()` is defunct
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we do still need this section in join_rows() for backwards compatible support, we can't just call out to cross_join() or something.

Technically the unmatched = "error" and multiple = "error" args are supposed to do something even when you set by = character(). These aren't meaningful things to do IMO, which is why cross_join() doesn't have these args, but the simplest way to keep backwards compatible support here seemed to be to just continue to let it all run through vec_locate_matches(). I've added tests of the unmatched and multiple interaction with by = character().

I did go ahead and aggregate any usage of cross below into a single if (cross) branch that will be easy to remove in the future.

Comment on lines +784 to +786
is_cross_by <- function(x) {
identical(x, character()) || identical(x, list(x = character(), y = character()))
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 2nd option here is a technically valid way to do a cross join in CRAN dplyr

@DavisVaughan DavisVaughan requested a review from hadley December 16, 2022 16:31
@DavisVaughan DavisVaughan merged commit 28a4815 into tidyverse:main Dec 17, 2022
@DavisVaughan DavisVaughan deleted the feature/cross-join branch December 17, 2022 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cross_join() in place of by = character()

2 participants