Skip to content

across() improvements #335

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Feb 10, 2022
Merged

across() improvements #335

merged 18 commits into from
Feb 10, 2022

Conversation

mgirlich
Copy link
Collaborator

@mgirlich mgirlich commented Feb 7, 2022

Fixes the third point in #287.

  • Refactor dt_squash_if() and dt_squash_across() to use across_setup(), very similar to dplyr::across_setup().
  • The naming is now more consistent with dplyr. When passing a list to .fns missing names are not guessed from the function but filled with a number.
  • Characters cannot be used as function anymore (like in dplyr).
  • Fixed some minor issues for if_all().
  • Increase test coverage for if_all().
  • Use inject() instead of deprecated expr_interp()

* Refactor out common parts for `dt_squash_across()` and `dt_squash_if()`. This fixes two things for `if_all()`: i) evaluate dots in correct env and ii) can use multiple functions
* Formulas can now use `.y`
@markfairbanks
Copy link
Collaborator

markfairbanks commented Feb 7, 2022

I like the simplification part of this PR - but before I do a code review I wanted to discuss the use of ... with a purrr style lambda.

Basically I think we should error when ... is used with a purrr style lambda (reasoning below).

In #287 @eutwt mentioned being able to do this:

library(dplyr, warn.conflicts = FALSE)

df <- tibble(x = 1:3, y = 1:3)

df %>%
  mutate(
    across(c(x, y), ~ .x + .y, 1)
  )
#> # A tibble: 3 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     2     2
#> 2     3     3
#> 3     4     4

However .y here could just as easily be ..2. And this can be extended to use more dots:

library(dplyr, warn.conflicts = FALSE)

df <- tibble(x = 1:3, y = 1:3)

df %>%
  mutate(
    across(c(x, y), ~ .x + ..2 + ..3 + ..4, 1, 2, 3)
  )
#> # A tibble: 3 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     7     7
#> 2     8     8
#> 3     9     9

@mgirlich maybe you have a good idea to deal with situations like this with an arbitrary number of dots? But instead of supporting .y (or the case of only one dot with a purrr style lambda), I would rather error if dots are used.

In the across() docs for dplyr 1.0.8 it mentions the use of dots is discouraged. So I think it makes sense to have them error in this edge case in dtplyr.

Open to thoughts though.

@eutwt
Copy link
Collaborator

eutwt commented Feb 8, 2022

Although I find the ability to use ... in across useful, I think I agree dtplyr probably should not support it given that it's likely to be deprecated in dplyr 1.1.0. Apologies for contradicting my own issue / not keeping that issue current.

@mgirlich
Copy link
Collaborator Author

mgirlich commented Feb 8, 2022

Fine for me to remove to support for dots, especially for purrr style lambdas.
I didn't even know you could use ..2, ..3, etc for purrr style lambdas and to me your example just looks... terrible.

@markfairbanks
Copy link
Collaborator

markfairbanks commented Feb 8, 2022

I think we should definitely error in the case where dots are used with a purrr-style lambda. It doesn't currently work anyway so it won't break backwards compatibility.

However deprecating dots usage in general should be done with a deprecation lifecycle (if at all) since it currently works in dtplyr and also works in dplyr. I think a case like this should still work (without warnings) until dplyr makes the deprecation first.

df <- data.table(x = 1:3, y = 1:3)

df %>%
  summarize(
    across(c(x, y), mean, na.rm = TRUE)
  )

} else if (is_call(funs, "list")) {
args <- rlang::exprs_auto_name(funs[-1])
lapply(args, across_fun, env, data, j = j)
args <- rlang::call_args(funs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can drop rlang:: here

Comment on lines 55 to 57
"`dtplyr` does not support `...` in `across()` and `if_all()`.",
i = "Use a lambda instead.",
i = "Or inline them via purrr-style lambdas."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say something like "`dtplyr` does not support `...` in `across()` and `if_all()` when a lambda is used in `.fns`."?

@mgirlich mgirlich merged commit 45528ec into main Feb 10, 2022
@markfairbanks markfairbanks deleted the across-improvements branch February 10, 2022 08:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants