New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using n() in nested mutate()/summarize() calls gives unexpected results #2080
Comments
|
Thanks. What is the expected result? Does it work better with #2190? devtools::install_github("hadley/dplyr#2190") |
|
@hadley: I don't understand the issue here. |
|
I think this is a better example: library(dplyr)
library(purrr)
df <- tibble(x = list(
tibble(y = 1:2),
tibble(y = 1:3),
tibble(y = 1:4)
))
nrows <- function(df) {
df %>% summarise(n = n()) %>% .[["n"]]
}
df %>%
mutate(
n1 = x %>% map_int(nrows),
n2 = x %>% map_int(. %>% summarise(n = n()) %>% .[["n"]])
)
#> # A tibble: 3 × 3
#> x n1 n2
#> <list> <int> <int>
#> 1 <tibble [2 × 1]> 2 3
#> 2 <tibble [3 × 1]> 3 3
#> 3 <tibble [4 × 1]> 4 3
|
|
Same behavior with #2190. |
|
This looks like a too eager substitution by hybrid evaluation. It walks the expression, encounters df %>%
mutate(
n1 = x %>% map_int(nrows),
n2 = x %>% map_int(. %>% summarise(n = 3) %>% .[["n"]])
)I don't know how to fix this, short of disabling hybrid evaluation if the expression cannot be evaluated in full by the hybrid evaluator. But this will break e.g. @hadley: Please advise. |
|
Ah of course. In that case, df %>%
mutate(
n1 = x %>% map_int(nrows),
n2 = x %>% map_int(. %>% summarise(n = n()) %>% .[["n"]]),
n3 = map_int(x, ~ summarise(., n = n())[["n"]]),
n4 = map_int(x, function(df) summarise(df, n = n())[["n"]])
)
#> # A tibble: 3 × 5
#> x n1 n2 n3 n4
#> <list> <int> <int> <int> <int>
#> 1 <tibble [2 × 1]> 2 3 3 3
#> 2 <tibble [3 × 1]> 3 3 3 3
#> 3 <tibble [4 × 1]> 4 3 3 3I think we can probably leave off resolving this until the next version? I suspect it will require more re-thinking about how the hybrid evaluator works. |
|
Adding to this issue, is there any reason why output1 is broken, but not output2 or output3 ? |
|
I'll add this version to the mix to create the Magritte lambda outside of the mutate: I've added the library(dplyr)
#>
#> Attachement du package : 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(purrr)
df <- tibble(x = list(
tibble(y = 1:2),
tibble(y = 1:3),
tibble(y = 1:4)
))
nrows <- function(df) {
df %>% summarise(n = n()) %>% .[["n"]]
}
nrows_magrittr_lambda <- . %>% summarise(n = n()) %>% .[["n"]]
trace( dplyr:::mutate.tbl_df, tracer = quote(print(dots)), at = 3 )
#> Tracing function "mutate.tbl_df" in package "dplyr
#> (not-exported)"
#> [1] "mutate.tbl_df"
mutate( df,
n1 = x %>% map_int(nrows),
n5 = map_int(x, nrows_magrittr_lambda),
n2 = x %>% map_int(. %>% summarise(n = n()) %>% .[["n"]]),
n3 = map_int(x, ~ summarise(., n = n())[["n"]]),
n4 = map_int(x, function(df) summarise(df, n = n())[["n"]])
)
#> Tracing mutate.tbl_df(df, n1 = x %>% map_int(nrows), n5 = map_int(x, .... step 3
#> $n1
#> <quosure>
#> expr: ^x %>% map_int(nrows)
#> env: global
#>
#> $n5
#> <quosure>
#> expr: ^map_int(x, nrows_magrittr_lambda)
#> env: global
#>
#> $n2
#> <quosure>
#> expr: ^x %>% map_int(. %>% summarise(n = n()) %>% .[["n"]])
#> env: global
#>
#> $n3
#> <quosure>
#> expr: ^map_int(x, ~summarise(., n = n())[["n"]])
#> env: global
#>
#> $n4
#> <quosure>
#> expr: ^map_int(x, function(df) summarise(df, n = n())[["n"]])
#> env: global
#> # A tibble: 3 x 6
#> x n1 n5 n2 n3 n4
#> <list> <int> <int> <int> <int> <int>
#> 1 <tibble [2 × 1]> 2 2 3 3 3
#> 2 <tibble [3 × 1]> 3 3 3 3 3
#> 3 <tibble [4 × 1]> 4 4 3 3 3Created on 2018-03-05 by the reprex package (v0.2.0). |
|
I finally got it: The |
|
Indeed. hybrid simplification is too eager. With this debugging: I get: |
|
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
When transitioning from by_row() to map() approach I've found that several dplyr/purrr/tidyr functions do not evaluate within the map() environment. For instance below I was expecting the value returned by n() in the map() example to match that of the by_row() version. Instead it returns the number of rows of the nested input
Temp. This might be intended but I can't think of an obvious way to use dplyr::n() on nested tibbles via map().The text was updated successfully, but these errors were encountered: