Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support relationship parameter in join verbs #1305

Closed
bairdj opened this issue Jun 15, 2023 · 1 comment
Closed

Support relationship parameter in join verbs #1305

bairdj opened this issue Jun 15, 2023 · 1 comment

Comments

@bairdj
Copy link
Contributor

bairdj commented Jun 15, 2023

dplyr added a relationship parameter to join verbs in v1.1.1, which throws an error if provided to dbplyr.

The issue I am experiencing is for some unit tests that run across both local and lazy tables. dplyr throws a warning if there is a many-to-many relationship, which can be silenced by providing the relationship parameter, but this then causes the lazy table tests to fail.

I don't think dbplyr necessarily needs to support the check (would probably have to be an extra query to check if it is violated), but it would be good if it would accept the parameter.

library(dplyr, warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 4.2.3
library(dbplyr, warn.conflicts = FALSE)
#> Warning: package 'dbplyr' was built under R version 4.2.3

join_df <- tibble(
  Species = c("setosa", "versicolor", "virginica"),
  x = 1:3
)

# dplyr join
iris %>%
  inner_join(
    join_df,
    by = join_by(Species),
    relationship = "many-to-one"
  ) %>%
  slice_head(n = 5)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species x
#> 1          5.1         3.5          1.4         0.2  setosa 1
#> 2          4.9         3.0          1.4         0.2  setosa 1
#> 3          4.7         3.2          1.3         0.2  setosa 1
#> 4          4.6         3.1          1.5         0.2  setosa 1
#> 5          5.0         3.6          1.4         0.2  setosa 1

# dbplyr join
iris_dbplyr <- tbl_lazy(iris, con = simulate_mssql())

iris_dbplyr %>%
  inner_join(
    join_df,
    by = join_by(Species),
    relationship = "many-to-one"
  ) %>%
  slice_head(n = 5)
#> Error in `inner_join()`:
#> ! `...` must be empty.
#> ✖ Problematic argument:
#> • relationship = "many-to-one"
#> Backtrace:
#>     ▆
#>  1. ├─... %>% slice_head(n = 5)
#>  2. ├─dplyr::slice_head(., n = 5)
#>  3. ├─dplyr::inner_join(., join_df, by = join_by(Species), relationship = "many-to-one")
#>  4. └─dbplyr:::inner_join.tbl_lazy(., join_df, by = join_by(Species), relationship = "many-to-one")
#>  5.   └─rlang::check_dots_empty()
#>  6.     └─rlang:::action_dots(...)
#>  7.       ├─base (local) try_dots(...)
#>  8.       └─rlang (local) action(...)

Created on 2023-06-15 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.2 (2022-10-31 ucrt)
#>  os       Windows 10 x64 (build 19045)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_United Kingdom.utf8
#>  ctype    English_United Kingdom.utf8
#>  tz       Europe/London
#>  date     2023-06-15
#>  pandoc   3.1 @ C:/Users/JBaird/AppData/Local/Pandoc/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.1   2023-03-23 [1] CRAN (R 4.2.3)
#>  DBI           1.1.3   2022-06-18 [1] CRAN (R 4.2.2)
#>  dbplyr      * 2.3.2   2023-03-21 [1] CRAN (R 4.2.3)
#>  digest        0.6.31  2022-12-11 [1] CRAN (R 4.2.2)
#>  dplyr       * 1.1.2   2023-04-20 [1] CRAN (R 4.2.3)
#>  evaluate      0.21    2023-05-05 [1] CRAN (R 4.2.3)
#>  fansi         1.0.4   2023-01-22 [1] CRAN (R 4.2.2)
#>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.2.2)
#>  fs            1.6.2   2023-04-25 [1] CRAN (R 4.2.3)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.2.2)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.2)
#>  htmltools     0.5.5   2023-03-23 [1] CRAN (R 4.2.3)
#>  knitr         1.43    2023-05-25 [1] CRAN (R 4.2.3)
#>  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.2.2)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.2)
#>  pillar        1.9.0   2023-03-22 [1] CRAN (R 4.2.3)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.2)
#>  purrr         1.0.1   2023-01-10 [1] CRAN (R 4.2.2)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.2.3)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.2.2)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 4.2.2)
#>  R.utils       2.12.2  2022-11-11 [1] CRAN (R 4.2.3)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.2)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 4.2.3)
#>  rlang         1.1.1   2023-04-28 [1] CRAN (R 4.2.3)
#>  rmarkdown     2.22    2023-06-01 [1] CRAN (R 4.2.3)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.3)
#>  styler        1.9.1   2023-03-04 [1] CRAN (R 4.2.3)
#>  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.2.3)
#>  tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.2.2)
#>  utf8          1.2.3   2023-01-31 [1] CRAN (R 4.2.2)
#>  vctrs         0.6.2   2023-04-19 [1] CRAN (R 4.2.3)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.2)
#>  xfun          0.39    2023-04-20 [1] CRAN (R 4.2.3)
#>  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.2.2)
#> 
#>  [1] C:/Users/JBaird/AppData/Local/Programs/R/R-4.2.2/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@mgirlich
Copy link
Collaborator

I'm afraid it doesn't make sense to accept the relationship argument. This would raise the expectation that this argument is actually supported by dbplyr which isn't the case.
Therefore, it would make more sense that you adapt your tests, e.g. using a custom join function where you ignore the relationship argument:

my_inner_join <- function(x,
                          y,
                          by = NULL,
                          copy = FALSE,
                          suffix = NULL,
                          ...,
                          keep = NULL,
                          na_matches = c("never", "na"),
                          multiple = NULL,
                          unmatched = "drop",
                          relationship = NULL,
                          sql_on = NULL,
                          auto_index = FALSE,
                          x_as = NULL,
                          y_as = NULL) {
  inner_join(
    x,
    y,
    by = by,
    copy = copy,
    suffix = suffix,
    keep = keep,
    na_matches = na_matches,
    multiple = multiple,
    unmatched = unmatched,
    sql_on = sql_on,
    auto_index = auto_index,
    x_as = x_as,
    y_as = y_as
  )
}

bairdj pushed a commit to bairdj/dbplyr that referenced this issue Jun 15, 2023
mgirlich added a commit that referenced this issue Jun 15, 2023
* Support relationship in joins

* Use `check_unsupported_arg()`

* Update news for #1305

* Update NEWS.md

Co-authored-by: Maximilian Girlich <maximilian.girlich@metoda.com>

---------

Co-authored-by: Maximilian Girlich <maximilian.girlich@metoda.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants