Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter() and data frame results, filter(across()) #4678

Closed
romainfrancois opened this issue Dec 30, 2019 · 4 comments
Closed

filter() and data frame results, filter(across()) #4678

romainfrancois opened this issue Dec 30, 2019 · 4 comments
Assignees
Labels
feature
Milestone

Comments

@romainfrancois
Copy link
Member

romainfrancois commented Dec 30, 2019

when we get a data frame from an expression in filter() perhaps we should & all its columns, this would enable something like

library(dplyr, warn.conflicts = FALSE)

iris %>% 
  filter(across(starts_with("Sepal"), ~ . > 4))
#> Error: filter() expressions should return logical vectors of the same size as the group

iris %>% 
  filter(Sepal.Length > 4 & Sepal.Width > 4)
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.7         4.4          1.5         0.4  setosa
#> 2          5.2         4.1          1.5         0.1  setosa
#> 3          5.5         4.2          1.4         0.2  setosa

iris %>% 
  filter_at(vars(starts_with("Sepal")), all_vars(. > 4))
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> 1          5.7         4.4          1.5         0.4  setosa
#> 2          5.2         4.1          1.5         0.1  setosa
#> 3          5.5         4.2          1.4         0.2  setosa

Created on 2019-12-30 by the reprex package (v0.3.0.9000)

This might be a better model than the current strategy of tricking ... into a single expression with all_exprs()

@romainfrancois romainfrancois added the feature label Dec 30, 2019
@romainfrancois romainfrancois self-assigned this Dec 30, 2019
@romainfrancois romainfrancois added this to the 0.9.0 milestone Dec 30, 2019
@hadley
Copy link
Member

hadley commented Dec 30, 2019

Yeah, that makes sense to me.

OTOH maybe all_vars() and any_vars() should become across_any() and across_all()?

@romainfrancois
Copy link
Member Author

romainfrancois commented Dec 30, 2019

... or we just need a function somewhere that would take a list of logical vector and reduce& them, so we use this around the across() call:

library(dplyr, warn.conflicts = FALSE)
library(purrr)

iris <- as_tibble(iris)

rowAll <- function(df) {
  purrr::reduce(df, `&`)
}
rowAny <- function(df) {
  purrr::reduce(df, `|`)
}

iris %>% 
  filter(rowAll(across(starts_with("Sepal"), ~ . > 3)))
#> # A tibble: 67 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.7         3.2          1.3         0.2 setosa 
#>  3          4.6         3.1          1.5         0.2 setosa 
#>  4          5           3.6          1.4         0.2 setosa 
#>  5          5.4         3.9          1.7         0.4 setosa 
#>  6          4.6         3.4          1.4         0.3 setosa 
#>  7          5           3.4          1.5         0.2 setosa 
#>  8          4.9         3.1          1.5         0.1 setosa 
#>  9          5.4         3.7          1.5         0.2 setosa 
#> 10          4.8         3.4          1.6         0.2 setosa 
#> # … with 57 more rows

iris %>% 
  filter(rowAny(across(starts_with("Sepal"), ~ . > 3)))
#> # A tibble: 150 x 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # … with 140 more rows

Created on 2019-12-30 by the reprex package (v0.3.0.9000)

@romainfrancois
Copy link
Member Author

romainfrancois commented Dec 30, 2019

But still, given the way across() works with other verbs, this would not be surprising that :

%>% filter(across(starts_with("Sepal", test)))

give the same result as :

%>% filter(test(Sepal.Length), test(Sepal.Width))

@hadley
Copy link
Member

hadley commented Dec 30, 2019

Yeah, I'd say implement the data frame method regardless, and we'll come back later to talk about the overall interface (I suspect we will want row version of all the existing cumulative and summarising functions)

romainfrancois added a commit that referenced this issue Dec 30, 2019
romainfrancois added a commit that referenced this issue Dec 31, 2019
romainfrancois added a commit that referenced this issue Jan 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature
Projects
None yet
Development

No branches or pull requests

2 participants