Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inner_join incorrectly throws error #6835

Closed
lymanmark opened this issue Apr 26, 2023 · 1 comment · Fixed by r-lib/vctrs#1835
Closed

inner_join incorrectly throws error #6835

lymanmark opened this issue Apr 26, 2023 · 1 comment · Fixed by r-lib/vctrs#1835

Comments

@lymanmark
Copy link

When using inequality joins and setting relationship to "one-to-one", an error is thrown despite the criteria for a "one-to-one" match being satisfied.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

x <- data.frame(
  x = 543,
  y = 178,
  x_max = 567,
  x_text = "Text 1"
)

y <-
  data.frame(
    x = c(457, 450),
    y = c(114, 167),
    x_max = c(606, 660),
    y_text = c("Text 2", "Text 3")
  )

# Returns one row as expected
inner_join(
  x,
  y,
  by = join_by(
    overlaps(x, x_max, x, x_max),
    closest(y >= y)
  )
)
#>   x.x y.x x_max.x x_text x.y y.y x_max.y y_text
#> 1 543 178     567 Text 1 450 167     660 Text 3

# Throws error with multiple matches
inner_join(
  x,
  y,
  by = join_by(
    overlaps(x, x_max, x, x_max),
    closest(y >= y)
  ),
  relationship = "one-to-one"
)
#> Error in `inner_join()`:
#> ! Each row in `x` must match at most 1 row in `y`.
#> ℹ Row 1 of `x` matches multiple rows in `y`.
#> Backtrace:
#>      ▆
#>   1. ├─dplyr::inner_join(...)
#>   2. ├─dplyr:::inner_join.data.frame(...)
#>   3. │ └─dplyr:::join_mutate(...)
#>   4. │   └─dplyr:::join_rows(...)
#>   5. │     └─dplyr:::dplyr_locate_matches(...)
#>   6. │       ├─base::withCallingHandlers(...)
#>   7. │       └─vctrs::vec_locate_matches(...)
#>   8. ├─vctrs:::stop_matches_relationship_one_to_one(...)
#>   9. │ └─vctrs:::stop_matches_relationship(...)
#>  10. │   └─vctrs:::stop_matches(...)
#>  11. │     └─vctrs:::stop_vctrs(...)
#>  12. │       └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = call)
#>  13. │         └─rlang:::signal_abort(cnd, .file)
#>  14. │           └─base::signalCondition(cnd)
#>  15. └─dplyr (local) `<fn>`(`<vc______>`)
#>  16.   └─dplyr:::rethrow_error_join_relationship_one_to_one(cnd, error_call)
#>  17.     └─dplyr:::stop_join_matches_multiple(...)
#>  18.       └─dplyr:::stop_join(...)
#>  19.         └─dplyr:::stop_dplyr(...)
#>  20.           └─rlang::abort(...)

Created on 2023-04-26 with reprex v2.0.2

@DavisVaughan
Copy link
Member

Minimal reprex

library(vctrs)

a <- data_frame(x = 1, y = 5, z = 3)
b <- data_frame(x = c(1, 2), y = c(4, 3), z = c(1, 2))

vec_locate_matches(
  a,
  b,
  condition = c("<=", ">=", ">="),
  filter = c("none", "none", "max"),
  relationship = "none"
)
#>   needles haystack
#> 1       1        2

vec_locate_matches(
  a,
  b,
  condition = c("<=", ">=", ">="),
  filter = c("none", "none", "max"),
  relationship = "one-to-one"
)
#> Error in `vec_locate_matches()`:
#> ! Each value of `needles` can match at most 1 value from `haystack`.
#> ✖ Location 1 of `needles` matches multiple values.
#> Backtrace:
#>     ▆
#>  1. ├─vctrs::vec_locate_matches(...)
#>  2. └─vctrs:::stop_matches_relationship_one_to_one(...) at vctrs/R/match.R:320:2
#>  3.   └─vctrs:::stop_matches_relationship(...) at vctrs/R/match.R:491:2
#>  4.     └─vctrs:::stop_matches(...) at vctrs/R/match.R:563:2
#>  5.       └─vctrs:::stop_vctrs(...) at vctrs/R/match.R:360:2
#>  6.         └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = call) at vctrs/R/conditions.R:64:2

Notably this doesn't error, so it has something to do with nested containment groups because b in the above example isn't fully ordered on all columns.

a <- data_frame(x = 1, y = 5, z = 3)
b <- data_frame(x = c(1, 2), y = c(3, 4), z = c(1, 2))

vec_locate_matches(
  a,
  b,
  condition = c("<=", ">=", ">="),
  filter = c("none", "none", "max"),
  relationship = "one-to-one"
)
#>   needles haystack
#> 1       1        2

netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Jun 17, 2023
# vctrs 0.6.3

* Fixed an issue where certain ALTREP row names were being materialized when
  passed to `new_data_frame()`. We've fixed this by removing a safeguard in
  `new_data_frame()` that performed a compatibility check when both `n` and
  `row.names` were provided. Because this is a low level function designed for
  performance, it is up to the caller to ensure these inputs are compatible
  (tidyverse/dplyr#6596).

* Fixed an issue where `vec_set_*()` used with data frames could accidentally
  return an object with the type of the proxy rather than the type of the
  original inputs (#1837).

* Fixed a rare `vec_locate_matches()` bug that could occur when using a max/min
  `filter` (tidyverse/dplyr#6835).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants