Scoped mutate/summarize with multiple columns AND functions #4125

romainfrancois · 2019-01-24T13:28:30Z

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

storms %>% 
  summarize_at(vars(wind, pressure), list(mean, median)) 
#> # A tibble: 1 x 4
#>   `wind_<fn>_1` `pressure_<fn>_1` `wind_<fn>_2` `pressure_<fn>_2`
#>           <dbl>             <dbl>         <dbl>             <dbl>
#> 1          53.5              992.            45               999

^{Created on 2019-01-24 by the reprex package (v0.2.1.9000)}

romainfrancois · 2019-01-24T13:28:43Z

closes #4119

hadley · 2019-01-24T14:21:14Z

Our policy is to not create non-syntactic names, so <fn> needs to be something else.

hadley · 2019-01-24T14:22:15Z

R/colwise-mutate.R

@@ -309,6 +309,11 @@ manip_apply_syms <- function(funs, syms, tbl) {
    names(out) <- names(funs)
  } else {
    syms_names <- map_chr(syms, as_string)
+    names(funs) <- if_else(
+      names(funs) == "<fn>",


Where does <fn> get set as a name?

dplyr/R/funs.R

Line 65 in cf0e282

temp <- exprs_auto_name(temp)

Eventually we should add name repairing arguments to exprs_auto_name() (or its successor).

Can we fix it higher up? i.e. if the list is unnamed, we manually name it fn1, fn2 etc?

Maybe, and I guess as_fun_list() would be the right spot.

But it's all part of this complicated set of cases that we try to give nice defaults for, e.g. if there's only one function, named and unnamed do something different and are both legit:

library(dplyr, warn.conflicts = FALSE) summarise_at(iris, vars(starts_with("Sepal")), list(mean)) #> Sepal.Length Sepal.Width #> 1 5.843333 3.057333 summarise_at(iris, vars(starts_with("Sepal")), list(mean = mean)) #> Sepal.Length_mean Sepal.Width_mean #> 1 5.843333 3.057333

@romainfrancois Good point. We should only auto-name when there's more than one function.

I see that tibble creates ..n suffix to disambiguate names. Is that to draw user attention on these names? I think I'd prefer an n suffix in this case.

We actually do need to fix this higher up, otherwise this happens (on the second path):

> summarise_at(iris, vars(Sepal.Width), list(mean, median)) <fn> 1 3

lionel- · 2019-01-24T14:34:20Z

@hadley This is connected to the labelling of literals, i.e. in

transmute(mtcars, !!seq_len(nrow(mtcars)))

Currently the pillar type label is taken in that case.

romainfrancois · 2019-01-25T10:55:13Z

There's also the problem from #2912 when we use list of lambdas calling the same function:

library(dplyr, warn.conflicts = FALSE)

iris %>% 
  summarise_at(vars(starts_with("Sepal")), list(~quantile(., .25), ~quantile(., .75)))
#>   Sepal.Length_quantile Sepal.Width_quantile
#> 1                   6.4                  3.3

iris %>% 
  summarise_at(vars(starts_with("Sepal")), list(q25 = ~quantile(., .25), q75 = ~quantile(., .75)))
#>   Sepal.Length_q25 Sepal.Width_q25 Sepal.Length_q75 Sepal.Width_q75
#> 1              5.1             2.8              6.4             3.3

^{Created on 2019-01-25 by the reprex package (v0.2.1.9000)}

krlmlr · 2019-01-25T11:22:05Z

Can we use var..1, var..2 etc. as names? this is consistent with name repair in tibble which eventually should move to vctrs.?

lionel- · 2019-01-25T11:34:32Z

The name repairing should move to rlang I think, because of all the auto-naming in captured arguments. What is the motivation in using ..n as suffix by the way? Is that to draw user attention on variables that need to be named manually?

Edit: If yes, do we need to make a difference between names repair and names generation?

romainfrancois · 2019-01-25T14:02:45Z

I went the route of fn1, fn2 etc as hinted by @hadley in the thread above. But we still need some name repair/making strategy for esoteric things like:

library(dplyr, warn.conflicts = FALSE)

# both gets called "quantile"
iris %>% 
  summarise_at(vars(starts_with("Sepal")), list(~quantile(., .25), ~quantile(., .75)))
#>   Sepal.Length_quantile Sepal.Width_quantile
#> 1                   6.4                  3.3

# both unnamed, so called fn1, fn2
iris %>% 
  summarise_at(vars(starts_with("Sepal")), list(mean, median))
#>   Sepal.Length_fn1 Sepal.Width_fn1 Sepal.Length_fn2 Sepal.Width_fn2
#> 1         5.843333        3.057333              5.8               3

# what if we really wanted a fn1 ?
iris %>% 
  summarise_at(vars(starts_with("Sepal")), list(mean, median, fn1 = sum))
#>   Sepal.Length_fn1 Sepal.Width_fn1 Sepal.Length_fn2 Sepal.Width_fn2
#> 1            876.5           458.6              5.8               3

This looks tangent to name repairing, or name generation as @lionel- hints. Do we need a package specific to naming things ? @jennybc

romainfrancois · 2019-01-25T14:24:25Z

Borrowing tibble:::universal_names() gives:

library(dplyr, warn.conflicts = FALSE)

iris %>% 
  summarise_at(vars(starts_with("Sepal")), list(~quantile(., .25), ~quantile(., .75)))
#>   Sepal.Length_quantile..1 Sepal.Width_quantile..1
#> 1                      5.1                     2.8
#>   Sepal.Length_quantile..2 Sepal.Width_quantile..2
#> 1                      6.4                     3.3

iris %>% 
  summarise_at(vars(starts_with("Sepal")), list(mean, median))
#>   Sepal.Length_fn..1 Sepal.Width_fn..1 Sepal.Length_fn..2
#> 1           5.843333          3.057333                5.8
#>   Sepal.Width_fn..2
#> 1                 3

iris %>% 
  summarise_at(vars(starts_with("Sepal")), list(mean, median, fn..1 = mean))
#>   Sepal.Length_fn..1 Sepal.Width_fn..1 Sepal.Length_fn..2
#> 1           5.843333          3.057333                5.8
#>   Sepal.Width_fn..2 Sepal.Length_fn..3 Sepal.Width_fn..3
#> 1                 3           5.843333          3.057333

krlmlr · 2019-01-25T14:25:09Z

..n is the disambiguating suffix for duplicate names (and also empty names), ugly by intention. I'm fine with moving name repair to rlang.

CC @jennybc.

lionel- · 2019-01-25T15:11:52Z

That makes sense. In the case of repairing, we need some peppery typography to draw the attention to potential problems. In the case of name generation, using other rules might be ok? Producing readable labels might be a more important goal in that case.

romainfrancois · 2019-01-25T15:37:37Z

I'd still think in that case summarise_at(vars(starts_with("Sepal")), list(mean, median)) we want to draw attention

romainfrancois · 2019-01-25T15:47:14Z

Another random though is that perhaps when the function is S3 generic, maybe it could just be named after its name, i.e. what's inside UseMethod ?

krlmlr · 2019-01-25T16:40:55Z

We could support other name-generation rules, but for now it feels like ..1 is the best we have.

lionel- · 2019-01-25T16:44:25Z

Another random though is that perhaps when the function is S3 generic, maybe it could just be named after its name, i.e. what's inside UseMethod ?

Interesting idea, but it might make it more unpredictable for users to figure out when they need to supply names.

krlmlr · 2019-01-25T16:44:36Z

Do we want teach users to use lst() (or a better equivalent in vctrs) for auto-naming? I'm not sure we should be trying to infer function names, but we could also look up all functions in all namespaces.

lionel- · 2019-01-25T16:49:08Z

I'm not sure we should be trying to infer function names, but we could also look up all functions in all namespaces.

We don't have to scan all namespaces, only the topenv of the function. But I think we shouldn't go down this road because of performance implications and because this will cause different behaviours for exported and sourced functions. Also the user might have imported the function locally with another name.

romainfrancois · 2019-01-25T17:18:12Z

Yeah this would only buy some more time but then make a bigger monster.

Using some name repair function on generated names, or some name generation gives us ugly enough names for users to want to change them. We could inform() that we generated names, and hint about how to get in control.

romainfrancois · 2019-01-28T16:20:59Z

sort of related to #3769

krlmlr · 2019-01-29T09:47:17Z

R/colwise-mutate.R

-    #   nms[dup] <- paste(nms[dup], which(dup), sep = ".")
-    # }
+
+    # append ..<position> to duplicates


Would you mind copying the relevant parts of name repair from tibble to here, to make sure we're consistent? The current code will misbehave in some corner cases.

I can move the code to a compat-name-repair.R file in tibble so that you just need to copy that file.

Sure. Essentially, this needs universal_names(quiet = TRUE)

@krlmlr aren't we switching to use ... because ..n is non-syntactic?

With universal names we add another dot if necessary.

library(tibble) as_tibble(list(1, a = 2, a = 3), .name_repair = "unique") #> New names: #> * `` -> `..1` #> * a -> a..2 #> * a -> a..3 #> # A tibble: 1 x 3 #> ..1 a..2 a..3 #> <dbl> <dbl> <dbl> #> 1 1 2 3 as_tibble(list(1, a = 2, a = 3), .name_repair = "universal") #> New names: #> * `` -> ...1 #> * a -> a..2 #> * a -> a..3 #> # A tibble: 1 x 3 #> ...1 a..2 a..3 #> <dbl> <dbl> <dbl> #> 1 1 2 3

^{Created on 2019-01-29 by the reprex package (v0.2.1.9000)}

@romainfrancois: Can you please pick up from here, let me know if anything's missing: https://github.com/tidyverse/tibble/blob/r-2.0/R/compat-name-repair.R

…e'd need

…pend ..<position> in that case

romainfrancois · 2019-01-30T09:24:37Z

I think this is ready now, getting this. Prior to calling universal_names() from the compat file, <fn> becomes fn, otherwise we'd get wind_.fn...1

library(dplyr, warn.conflicts = FALSE)

storms %>%
  summarise_at(vars(wind, pressure), list(mean, median))
#> # A tibble: 1 x 4
#>   wind_fn..1 pressure_fn..1 wind_fn..2 pressure_fn..2
#>        <dbl>          <dbl>      <dbl>          <dbl>
#> 1       53.5           992.         45            999

iris %>% 
  summarise_at(vars(starts_with("Sepal")), list(~quantile(., .25), ~quantile(., .75)))
#>   Sepal.Length_quantile..1 Sepal.Width_quantile..1
#> 1                      5.1                     2.8
#>   Sepal.Length_quantile..2 Sepal.Width_quantile..2
#> 1                      6.4                     3.3

lock · 2019-09-07T14:58:20Z

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

romainfrancois requested a review from hadley January 24, 2019 13:29

hadley reviewed Jan 24, 2019

View reviewed changes

romainfrancois mentioned this pull request Jan 29, 2019

Implement a safer alternative to funs() #3433

Closed

romainfrancois force-pushed the 4119_colwise_unnamed branch from a2db1d6 to d10f01b Compare January 29, 2019 09:19

krlmlr reviewed Jan 29, 2019

View reviewed changes

romainfrancois added 6 commits January 29, 2019 11:16

tests for multiple columns + multiple unnamed functions #4119

07fa63c

tests for multiple columns + multiple unnamed functions #4119

582f97d

disregard the have_name attribute, suffix anything named just <fn>

0dbb94c

unnamed -> fn1, fn2, ...

eeb8eee

just illustrating that tibble::universal_names() gets close to what w…

0c52273

…e'd need

not explicitely use universal_names, just check for duplictaes and ap…

b173332

…pend ..<position> in that case

romainfrancois force-pushed the 4119_colwise_unnamed branch from 08d6f1b to b173332 Compare January 29, 2019 10:16

romainfrancois added this to the 0.8.0 milestone Jan 29, 2019

using compat-name-repair.R from tibble, for universal_names()

5cfde4a

romainfrancois merged commit fc220bb into master Jan 30, 2019

romainfrancois mentioned this pull request Feb 2, 2019

Documentation of name repair in scoped "mutating" verbs #4146

Closed

romainfrancois deleted the 4119_colwise_unnamed branch March 11, 2019 14:55

lock bot locked and limited conversation to collaborators Sep 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scoped mutate/summarize with multiple columns AND functions #4125

Scoped mutate/summarize with multiple columns AND functions #4125

romainfrancois commented Jan 24, 2019

romainfrancois commented Jan 24, 2019

hadley commented Jan 24, 2019

hadley Jan 24, 2019

romainfrancois Jan 24, 2019

lionel- Jan 24, 2019 •

edited

hadley Jan 24, 2019

romainfrancois Jan 24, 2019

lionel- Jan 24, 2019

romainfrancois Jan 25, 2019

lionel- commented Jan 24, 2019

romainfrancois commented Jan 25, 2019

krlmlr commented Jan 25, 2019

lionel- commented Jan 25, 2019 •

edited

romainfrancois commented Jan 25, 2019 •

edited

romainfrancois commented Jan 25, 2019 •

edited

krlmlr commented Jan 25, 2019

lionel- commented Jan 25, 2019

romainfrancois commented Jan 25, 2019

romainfrancois commented Jan 25, 2019

krlmlr commented Jan 25, 2019

lionel- commented Jan 25, 2019

krlmlr commented Jan 25, 2019

lionel- commented Jan 25, 2019

romainfrancois commented Jan 25, 2019

romainfrancois commented Jan 28, 2019

krlmlr Jan 29, 2019

romainfrancois Jan 29, 2019

hadley Jan 29, 2019

krlmlr Jan 29, 2019

krlmlr Jan 29, 2019

romainfrancois commented Jan 30, 2019

lock bot commented Sep 7, 2019

Scoped mutate/summarize with multiple columns AND functions #4125

Scoped mutate/summarize with multiple columns AND functions #4125

Conversation

romainfrancois commented Jan 24, 2019

romainfrancois commented Jan 24, 2019

hadley commented Jan 24, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lionel- Jan 24, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lionel- commented Jan 24, 2019

romainfrancois commented Jan 25, 2019

krlmlr commented Jan 25, 2019

lionel- commented Jan 25, 2019 • edited

romainfrancois commented Jan 25, 2019 • edited

romainfrancois commented Jan 25, 2019 • edited

krlmlr commented Jan 25, 2019

lionel- commented Jan 25, 2019

romainfrancois commented Jan 25, 2019

romainfrancois commented Jan 25, 2019

krlmlr commented Jan 25, 2019

lionel- commented Jan 25, 2019

krlmlr commented Jan 25, 2019

lionel- commented Jan 25, 2019

romainfrancois commented Jan 25, 2019

romainfrancois commented Jan 28, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

romainfrancois commented Jan 30, 2019

lock bot commented Sep 7, 2019

lionel- Jan 24, 2019 •

edited

lionel- commented Jan 25, 2019 •

edited

romainfrancois commented Jan 25, 2019 •

edited

romainfrancois commented Jan 25, 2019 •

edited