Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent mutate() from updating columns sequentially #4517

Closed
lionel- opened this issue Aug 2, 2019 · 3 comments
Closed

Prevent mutate() from updating columns sequentially #4517

lionel- opened this issue Aug 2, 2019 · 3 comments
Labels
feature a feature request or enhancement selection 🧺 tidyselect, scoped verbs, etc.

Comments

@lionel-
Copy link
Member

@lionel- lionel- commented Aug 2, 2019

Inlined lambdas can refer to existing columns:

as_tibble(mtcars) %>% mutate_all(~ .x / sd(disp))
#> # A tibble: 32 x 11
#>      mpg    cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1 0.169 0.0484 1.29    110  3.9   2.62  16.5     0     1     4     4
#>  2 0.169 0.0484 1.29    110  3.9   2.88  17.0     0     1     4     4
#>  3 0.184 0.0323 0.871    93  3.85  2.32  18.6     1     1     4     1
#>  4 0.173 0.0484 2.08    110  3.08  3.22  19.4     1     0     3     1
#>  5 0.151 0.0645 2.90    175  3.15  3.44  17.0     0     0     3     2
#>  6 0.146 0.0484 1.82    105  2.76  3.46  20.2     1     0     3     1
#>  7 0.115 0.0645 2.90    245  3.21  3.57  15.8     0     0     3     4
#>  8 0.197 0.0323 1.18     62  3.69  3.19  20       1     0     4     2
#>  9 0.184 0.0323 1.14     95  3.92  3.15  22.9     1     0     4     2
#> 10 0.155 0.0484 1.35    123  3.92  3.44  18.3     1     0     4     4
#> # … with 22 more rows

However as you can see this feature is currently hard to use because the input tibble is mutated as the map progresses. Here we're dividing all columns up to disp by its standard deviation. After that, sd(disp) is 1.

This is consistent with mutate(), since mapping a function just templates an expression over the columns of a data frame, but makes it hard to reason about. It also makes certain patterns hard to do (https://community.rstudio.com/t/dplyr-scoped-verbs-questions-about-scoped-verbs-sequential-execution/36609/5).

Could mutate() and summarise() have a parameter that prevents columns from being updated until the end? It should be useful with mapping(), scoped verbs, and the speculative superstache operator {{{ for templating expressions.

@lionel- lionel- changed the title Value semantics for scoped verbs / mapping() Prevent mutate() from updating columns sequentially Aug 2, 2019
@BenCarlsen
Copy link

@BenCarlsen BenCarlsen commented Aug 2, 2019

I think the title on this could be a little clearer since sequential update is an intrinsic feature of mutate.

What you're describing is an optional parameter to mutate() and summarize() that changes their behavior to give the same results as, e.g. for the above example :

as_tibble(mtcars) %>%
  imap_dfc(~ transmute(parent.env(environment())$., !!.y := .x / sd(disp)))

Solutions using purrr can become pretty complex to express for non _all operations and are not going to give optimal performance when used against a database backend, which is one reason that this functionality would be great to bring into dplyr/dbplyr.

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Nov 26, 2019

That's another case for across() from #4586 :

library(dplyr, warn.conflicts = FALSE)

as_tibble(mtcars) %>% 
  mutate(across(everything(), ~ .x / sd(disp)))
#> # A tibble: 32 x 11
#>      mpg    cyl  disp    hp   drat     wt  qsec      vs      am   gear
#>    <dbl>  <dbl> <dbl> <dbl>  <dbl>  <dbl> <dbl>   <dbl>   <dbl>  <dbl>
#>  1 0.169 0.0484 1.29  0.888 0.0315 0.0211 0.133 0       0.00807 0.0323
#>  2 0.169 0.0484 1.29  0.888 0.0315 0.0232 0.137 0       0.00807 0.0323
#>  3 0.184 0.0323 0.871 0.750 0.0311 0.0187 0.150 0.00807 0.00807 0.0323
#>  4 0.173 0.0484 2.08  0.888 0.0249 0.0259 0.157 0.00807 0       0.0242
#>  5 0.151 0.0645 2.90  1.41  0.0254 0.0278 0.137 0       0       0.0242
#>  6 0.146 0.0484 1.82  0.847 0.0223 0.0279 0.163 0.00807 0       0.0242
#>  7 0.115 0.0645 2.90  1.98  0.0259 0.0288 0.128 0       0       0.0242
#>  8 0.197 0.0323 1.18  0.500 0.0298 0.0257 0.161 0.00807 0       0.0323
#>  9 0.184 0.0323 1.14  0.767 0.0316 0.0254 0.185 0.00807 0       0.0323
#> 10 0.155 0.0484 1.35  0.992 0.0316 0.0278 0.148 0.00807 0       0.0323
#> # … with 22 more rows, and 1 more variable: carb <dbl>

Created on 2019-11-26 by the reprex package (v0.3.0.9000)

@hadley hadley added feature a feature request or enhancement selection 🧺 tidyselect, scoped verbs, etc. labels Dec 10, 2019
@hadley
Copy link
Member

@hadley hadley commented Jan 6, 2020

Closing since we'll retire the existing scoped variants (#4702) so this will no longer be an issue.

@hadley hadley closed this as completed Jan 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement selection 🧺 tidyselect, scoped verbs, etc.
Projects
None yet
Development

No branches or pull requests

4 participants