Skip to content

Commit

Permalink
Implement keep = T in full_join()
Browse files Browse the repository at this point in the history
Fixes #4589
  • Loading branch information
hadley committed Jan 13, 2020
1 parent ced9f6f commit ba3ff77
Show file tree
Hide file tree
Showing 5 changed files with 29 additions and 15 deletions.
5 changes: 5 additions & 0 deletions NEWS.md
@@ -1,7 +1,12 @@
# dplyr 0.9.0 (in development)

* `full_join()` gains keep argument so that you can optionally choose to
keep both sets of join keys (#4589). This is useful when you want to
figure out which rows were missing from either side.

* Join functions can now perform a cross-join by specifying `by = character()`
(#4206.)

* `filter()` and `summarise()` give better error messages.

* Zero-arg `group_indices()` is deprecated; instead use `cur_group_id()`.
Expand Down
25 changes: 12 additions & 13 deletions R/join.r
Expand Up @@ -88,7 +88,8 @@
#' Should be a character vector of length 2.
#' @param name The name of the list column nesting joins create.
#' If `NULL` the name of `y` is used.
#' @param keep If `TRUE` the by columns are kept in the nesting joins.
#' @param keep Should the join keys from `y` be preserved in the output?
#' Only applies to `nest_join()` and `full_join()`.
#' @param ... Other parameters passed onto methods.
#'
#' For example, `na_matches` controls how `NA` values are handled when
Expand Down Expand Up @@ -232,7 +233,7 @@ full_join.data.frame <- function(x, y, by = NULL, copy = FALSE,
na_matches = pkgconfig::get_config("dplyr::na_matches")) {

y <- auto_copy(x, y, copy = copy)
join_mutate(x, y, by = by, type = "full", suffix = suffix, na_matches = na_matches)
join_mutate(x, y, by = by, type = "full", suffix = suffix, na_matches = na_matches, keep = keep)
}

#' @export
Expand Down Expand Up @@ -292,18 +293,16 @@ join_mutate <- function(x, y, by, type,
x_out <- set_names(x[vars$x$out], names(vars$x$out))
y_out <- set_names(y[vars$y$out], names(vars$y$out))

if (length(rows$y_extra) == 0) {
out <- vec_slice(x_out, rows$x)
out[names(x_key)] <- vec_cast(out[names(x_key)], vec_ptype2(x_key, y_key))
out[names(y_out)] <- vec_slice(y_out, rows$y)
} else {
out <- vec_slice(x_out, c(rows$x, rep_along(rows$y_extra, NA_integer_)))
out[names(x_key)] <- vec_rbind(
vec_slice(x_key, rows$x),
vec_slice(y_key, rows$y_extra)
)
out[names(y_out)] <- vec_slice(y_out, c(rows$y, rows$y_extra))
out <- vec_slice(x_out, c(rows$x, rep_along(rows$y_extra, NA_integer_)))
out[names(x_key)] <- vec_cast(out[names(x_key)], vec_ptype2(x_key, y_key))

# If we're not keeping all y keys, need to copy over for the new rows
if (!keep) {
new_rows <- length(rows$x) + seq_along(rows$y_extra)
out[new_rows, names(y_key)] <- vec_slice(y_key, rows$y_extra)
}

out[names(y_out)] <- vec_slice(y_out, c(rows$y, rows$y_extra))
out
}

Expand Down
3 changes: 2 additions & 1 deletion man/join.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion man/join.data.frame.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions tests/testthat/test-join.r
Expand Up @@ -59,6 +59,14 @@ test_that("keys are coerced to symmetric type", {
expect_type(inner_join(bar, foo, by = "id")$id, "character")
})

test_that("when keep = TRUE, full_join() preserves both sets of keys", {
df1 <- tibble(a = c(2, 3), b = c(1, 2))
df2 <- tibble(x = c(3, 4), y = c(3, 4))
out <- full_join(df1, df2, by = c("a" = "x"), keep = TRUE)
expect_equal(out$a, c(2, 3, NA))
expect_equal(out$x, c(NA, 3, 4))
})

test_that("joins matches NAs by default (#892, #2033)", {
df1 <- tibble(x = c(NA_character_, 1))
df2 <- tibble(x = c(NA_character_, 2))
Expand Down

0 comments on commit ba3ff77

Please sign in to comment.