Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

should mutate_at lambdas be able to refer to other columns in .data? #4183

Closed
MilesMcBain opened this issue Feb 15, 2019 · 17 comments
Closed

should mutate_at lambdas be able to refer to other columns in .data? #4183

MilesMcBain opened this issue Feb 15, 2019 · 17 comments
Labels
Milestone

Comments

@MilesMcBain
Copy link

@MilesMcBain MilesMcBain commented Feb 15, 2019

As a simple example, let's say you have a normalising constant in one dataframe column and you need to normalise many other columns. mutate_at seems like it would be convenient to do this, but you can't refer to other columns in the dataframe apart from . cleanly.

Example:

library(tidyverse)

## Works but hardcoding dataset name feels lame.
iris %>%
  mutate_at(vars(starts_with("Sepal")),
            ~ . / iris$Petal.Width) %>%
  head()
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1         25.5       17.50          1.4         0.2  setosa
#> 2         24.5       15.00          1.4         0.2  setosa
#> 3         23.5       16.00          1.3         0.2  setosa
#> 4         23.0       15.50          1.5         0.2  setosa
#> 5         25.0       18.00          1.4         0.2  setosa
#> 6         13.5        9.75          1.7         0.4  setosa

## Fails
iris %>%
  mutate_at(vars(starts_with("Sepal")),
            ~ . / Petal.Width) %>%
  head()
#> Error in mutate_impl(.data, dots): Evaluation error: object 'Petal.Width' not found.

## Fails differently
iris %>%
  mutate_at(vars(starts_with("Sepal")),
            ~ . / .data$Petal.Width) %>%
  head()
#> Error in mutate_impl(.data, dots): Column `Sepal.Length` must be length 150 (the number of rows) or one, not 0

Created on 2019-02-15 by the reprex package (v0.2.1)

@yutannihilation
Copy link
Member

@yutannihilation yutannihilation commented Feb 15, 2019

Not sure what's happening here, but this works:

library(dplyr, warn.conflicts = FALSE)

iris %>%
  mutate_at(vars(starts_with("Sepal")),
            list(~ . / .data$Petal.Width)) %>%
  head()
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1         25.5       17.50          1.4         0.2  setosa
#> 2         24.5       15.00          1.4         0.2  setosa
#> 3         23.5       16.00          1.3         0.2  setosa
#> 4         23.0       15.50          1.5         0.2  setosa
#> 5         25.0       18.00          1.4         0.2  setosa
#> 6         13.5        9.75          1.7         0.4  setosa

Created on 2019-02-15 by the reprex package (v0.2.1.9000)

@MilesMcBain
Copy link
Author

@MilesMcBain MilesMcBain commented Feb 15, 2019

Oh thanks!

Well that sort of solves MY problem, but there seems to be some inconsistency here worth cleaning up. I also discovered this, which really feels like it should work:

library(tidyverse)

## fails
iris %>% 
  mutate_at(vars(starts_with("Sepal")),
            function(col, normaliser){
              col / normaliser
            },
            .data$Petal.Width) %>%
  head()
#> Error in mutate_impl(.data, dots): Column `Sepal.Length` must be length 150 (the number of rows) or one, not 0

## works
iris %>%
  mutate_at(vars(starts_with("Sepal")),
            function(col, normaliser){
              col / normaliser
            },
            iris$Petal.Width) %>%
  head()
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1         25.5       17.50          1.4         0.2  setosa
#> 2         24.5       15.00          1.4         0.2  setosa
#> 3         23.5       16.00          1.3         0.2  setosa
#> 4         23.0       15.50          1.5         0.2  setosa
#> 5         25.0       18.00          1.4         0.2  setosa
#> 6         13.5        9.75          1.7         0.4  setosa

Created on 2019-02-15 by the reprex package (v0.2.1)

@yutannihilation
Copy link
Member

@yutannihilation yutannihilation commented Feb 15, 2019

If .funs is a formula, it is processed by rlang::as_function(), but, if .funs is a list, it is processed by dplyr:::as_fun(). Probably, the former should also be handled by as_fun()? (Or, does as_function() need .env argument specified?)

dplyr/R/funs.R

Lines 85 to 99 in ce7f2bc

# Take functions by expression if they are supplied by name. This
# way we can evaluate it hybridly.
if (is_function(.x) && quo_is_symbol(.quo)) {
.x <- list(.quo)
} else if (is_character(.x)) {
.x <- as.list(.x)
} else if (is_bare_formula(.x, lhs = FALSE)) {
.x <- list(as_function(.x))
} else if (!is_list(.x)) {
.x <- list(.x)
}
funs <- map(.x, as_fun, .env = fun_env(.quo, .env), args)
new_funs(funs)
}

@yutannihilation
Copy link
Member

@yutannihilation yutannihilation commented Feb 15, 2019

I thought we can just remove this if branch, but things seem a bit more complicated...

dplyr/R/funs.R

Lines 91 to 92 in d2e77ee

} else if (is_bare_formula(.x, lhs = FALSE)) {
.x <- list(as_function(.x))

The change make some tests fail, because as_fun() is yet different from as_function() in that it accepts . only.

library(dplyr, warn.conflicts = FALSE)
library(rlang)

d <- head(mtcars, 2)

# c.f. https://github.com/tidyverse/dplyr/blob/d2e77ee6a83cd301cb7c1e73383347d69e757a9d/tests/testthat/test-colwise-select.R#L104
select_if(d, ~ is_integerish(.x))
#>               mpg cyl disp  hp vs am gear carb
#> Mazda RX4      21   6  160 110  0  1    4    4
#> Mazda RX4 Wag  21   6  160 110  0  1    4    4

select_if(d, list(~ is_integerish(.x)))
#> Error: Can't convert a list to function

Created on 2019-02-15 by the reprex package (v0.2.1)

@romainfrancois romainfrancois added this to the 0.8.1 milestone Feb 16, 2019
@yutannihilation
Copy link
Member

@yutannihilation yutannihilation commented Feb 17, 2019

Should this work? (might be related to #4150)

library(dplyr, warn.conflicts = FALSE)
summarise_at(data.frame(x = c(1:10, NA)), vars(starts_with("x")), ~mean(.), na.rm = TRUE)
#>    x
#> 1 NA

Created on 2019-02-17 by the reprex package (v0.2.1)

@yutannihilation
Copy link
Member

@yutannihilation yutannihilation commented Feb 17, 2019

In summary, we can provide the followings as .funs:

  • a function (either pre-defined or defined in-place)
  • a name of a function (deprecated, kept just for backword-compatibility)
  • a purrr-style lambda function notation

or

  • a list of either form above
library(dplyr, warn.conflicts = FALSE)

# a function
summarise_at(data.frame(x = c(1:10, NA)), vars(x), mean, na.rm = TRUE)
#>     x
#> 1 5.5
summarise_at(data.frame(x = c(1:10, NA)), vars(x), function(x, ...) mean(x, ...), na.rm = TRUE)
#>     x
#> 1 5.5

# a name of a function
summarise_at(data.frame(x = c(1:10, NA)), vars(x), "mean", na.rm = TRUE)
#>     x
#> 1 5.5

# a purrr-style lambda notation
summarise_at(data.frame(x = c(1:10, NA)), vars(x), ~mean(.), na.rm = TRUE)
#>    x
#> 1 NA


# wrapped by list
summarise_at(data.frame(x = c(1:10, NA)), vars(x), 
             list(mean, function(x, ...) mean(x, ...), "mean", ~mean(.)), 
             na.rm = TRUE)
#>   fn1 fn2 mean..3 mean..4
#> 1 5.5 5.5     5.5     5.5

Created on 2019-02-17 by the reprex package (v0.2.1)

Among those, currently it's only a bare purrr-style lambda function notation that is handled by rlang::as_function() and ignores .... Semantically, I feel this is the right behaviour. I don't think na.rm = TRUE should be passed to mean() in the following.

summarise_at(data.frame(x = c(1:10, NA)), vars(starts_with("x")), ~mean(.), na.rm = TRUE)

But... probably this is needed for backward-compatibility for funs()?

By the way, I'm not sure if this should work. Is ... inside of the context of mutate()?

iris %>% 
  mutate_at(vars(starts_with("Sepal")),
            function(col, normaliser){
              col / normaliser
            },
            .data$Petal.Width) %>%
  head()

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Feb 22, 2019

All these should work:

library(dplyr, warn.conflicts = FALSE)

iris <- head(iris, 3)

iris %>%
  mutate_if(is.numeric, ~ . / iris$Petal.Width)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1         25.5        17.5          7.0           1  setosa
#> 2         24.5        15.0          7.0           1  setosa
#> 3         23.5        16.0          6.5           1  setosa
iris %>%
  mutate_if(is.numeric, ~ . / Petal.Width)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1         25.5        17.5          7.0           1  setosa
#> 2         24.5        15.0          7.0           1  setosa
#> 3         23.5        16.0          6.5           1  setosa
iris %>%
  mutate_if(is.numeric, ~ . / .data$Petal.Width)
#> Error: Column `Sepal.Length` must be length 3 (the number of rows) or one, not 0

iris %>%
  mutate_if(is.numeric, ~ .x / iris$Petal.Width)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1         25.5        17.5          7.0           1  setosa
#> 2         24.5        15.0          7.0           1  setosa
#> 3         23.5        16.0          6.5           1  setosa
iris %>%
  mutate_if(is.numeric, ~ .x / Petal.Width)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1         25.5        17.5          7.0           1  setosa
#> 2         24.5        15.0          7.0           1  setosa
#> 3         23.5        16.0          6.5           1  setosa
iris %>%
  mutate_if(is.numeric, ~ .x / .data$Petal.Width)
#> Error: Column `Sepal.Length` must be length 3 (the number of rows) or one, not 0


iris %>%
  mutate_if(is.numeric, list(~ . / iris$Petal.Width))
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1         25.5        17.5          7.0           1  setosa
#> 2         24.5        15.0          7.0           1  setosa
#> 3         23.5        16.0          6.5           1  setosa
iris %>%
  mutate_if(is.numeric, list(~ . / Petal.Width))
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1         25.5        17.5          7.0           1  setosa
#> 2         24.5        15.0          7.0           1  setosa
#> 3         23.5        16.0          6.5           1  setosa
iris %>%
  mutate_if(is.numeric, list(~ . / .data$Petal.Width))
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1         25.5        17.5          7.0           1  setosa
#> 2         24.5        15.0          7.0           1  setosa
#> 3         23.5        16.0          6.5           1  setosa

iris %>%
  mutate_if(is.numeric, list(~ .x / iris$Petal.Width))
#> Error: object '.x' not found
iris %>%
  mutate_if(is.numeric, list(~ .x / Petal.Width))
#> Error: object '.x' not found
iris %>%
  mutate_if(is.numeric, list(~ .x / .data$Petal.Width))
#> Error: object '.x' not found

There are I believe two issues:

  • When formulas are used, e.g. ~ . / Petal.Width we go through rlang::as_function() there is a bit of internal handling for this, but apparently .data is not in scope

  • List of formulas go through as_fun instead of as_function so .x is not properly recognized as equivalent to .

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Feb 22, 2019

@MilesMcBain this should not work:

iris %>% 
  mutate_at(vars(starts_with("Sepal")),
            function(col, normaliser){
              col / normaliser
            },
            .data$Petal.Width)

The additional arguments are not quoted or dealt with specially, so .data$Petal.Width is evaluated normally.

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Feb 22, 2019

@yutannihilation I guess you'd reason that if you supply a formula, you are in total control of the arguments, however here's one trick:

library(dplyr)

tbl <- data.frame(x = c(1:10, NA))
tbl %>% 
  summarise_at(vars(starts_with("x")), ~ mean(.) , na.rm = TRUE)
#>    x
#> 1 NA
tbl %>% 
  summarise_at(vars(starts_with("x")), ~ mean(...) , na.rm = TRUE)
#>     x
#> 1 5.5

@yutannihilation
Copy link
Member

@yutannihilation yutannihilation commented Feb 22, 2019

Wow, cool... I didn't notice this.

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Feb 22, 2019

It might just be an accidental feature 🤗

@yutannihilation
Copy link
Member

@yutannihilation yutannihilation commented Feb 22, 2019

The additional arguments are not quoted or dealt with specially

FYI, I found we can use !!! and !! (only for LHS) for additional arguments by the power of rlang::list2(). Is this intensional feature...?

library(dplyr, warn.conflicts = FALSE)

d <- data.frame(x = 1:4)

mutate_at(d, "x", list(function(x, y, z) x + y + z), y = 10, z = 100)
#>     x
#> 1 111
#> 2 112
#> 3 113
#> 4 114

mutate_at(d, "x", list(function(x, y, z) x + y + z), !!!list(y = 10, z = 100))
#>     x
#> 1 111
#> 2 112
#> 3 113
#> 4 114

a <- quo(y)
b <- quo(z)
mutate_at(d, "x", list(function(x, y, z) x + y + z), !!a := 10, !!b := 100)
#>     x
#> 1 111
#> 2 112
#> 3 113
#> 4 114

Created on 2019-02-22 by the reprex package (v0.2.1)

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Feb 22, 2019

Sounds legit, the list2() call is in dplyr:::as_fun_list()

@yutannihilation
Copy link
Member

@yutannihilation yutannihilation commented Feb 22, 2019

I see.

@lionel-
Copy link
Member

@lionel- lionel- commented Mar 4, 2019

Is this intensional feature...?

I think it's accidental and I wouldn't rely on it. For instance purrr doesn't support splicing in argument lists and probably will not.

(Unless the mapped function supports splicing itself of course.)

@yutannihilation
Copy link
Member

@yutannihilation yutannihilation commented Mar 4, 2019

Oh, should I file a new issue for it? I guess here will be soon closed by #4217.

@lock
Copy link

@lock lock bot commented Aug 31, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Aug 31, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants