Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

group_by() and a filter() fails when using a factor column in a data.frame #4096

Closed
schloerke opened this issue Jan 10, 2019 · 2 comments
Closed

Comments

@schloerke
Copy link
Contributor

When using dplyr v0.7.8, there are no issues.

Using dplyr v0.8.0.9000, data.frames cause issues when grouped and then filtered. The missing levels found within the data.frame are creating unwanted combinations in the final result.

Using a tibble from the beginning does not cause an issue. In my case, it is useful to preserve the levels to use at a later time.

This was originally found using a dplyr::mutate command.

- Barret

Broken

Missing warning statements within the reprex that reference (converted from warning) no non-missing arguments to min; returning Inf.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
packageVersion("dplyr")
#> [1] '0.8.0.9000'
packageVersion("tibble")
#> [1] '2.0.0'

dt <- data.frame(
  X = c("A", "B", "C", "A", "B", "C"),
  Y = 1:6,
  stringsAsFactors = TRUE
)

# I expect
dt[1,] %>%
  rename(minY = Y)
#>   X minY
#> 1 A    1

# Produces extra rows for combinations that don't exist in the filtered data
dt %>%
  filter(X == "A") %>%
  group_by(X) %>%
  summarise(minY = min(Y))
#> # A tibble: 3 x 2
#>   X      minY
#>   <fct> <dbl>
#> 1 A         1
#> 2 B       Inf
#> 3 C       Inf


# does not work when converted to tibble
dt %>%
  as_tibble() %>%
  filter(X == "A") %>%
  group_by(X) %>%
  summarise(minY = min(Y))
#> # A tibble: 3 x 2
#>   X      minY
#>   <fct> <dbl>
#> 1 A         1
#> 2 B       Inf
#> 3 C       Inf

# works when data being supplied is a character
dt %>%
  mutate(X = as.character(X)) %>%
  filter(X == "A") %>%
  group_by(X) %>%
  summarise(minY = min(Y))
#> # A tibble: 1 x 2
#>   X      minY
#>   <chr> <dbl>
#> 1 A         1

# works when data is originally a tibble
tibble(X = dt$X, Y = dt$Y) %>%
  filter(X == "A") %>%
  group_by(X) %>%
  summarise(minY = min(Y))
#> # A tibble: 3 x 2
#>   X      minY
#>   <fct> <dbl>
#> 1 A         1
#> 2 B       Inf
#> 3 C       Inf

Created on 2019-01-10 by the reprex package (v0.2.1)

Working on CRAN

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
packageVersion("dplyr")
#> [1] '0.7.8'
packageVersion("tibble")
#> [1] '2.0.0'

dt <- data.frame(
  X = c("A", "B", "C", "A", "B", "C"),
  Y = 1:6,
  stringsAsFactors = TRUE
)

# I expect
dt[1,] %>%
  rename(minY = Y)
#>   X minY
#> 1 A    1

# Produces extra rows for combinations that don't exist in the filtered data
dt %>%
  filter(X == "A") %>%
  group_by(X) %>%
  summarise(minY = min(Y))
#> # A tibble: 1 x 2
#>   X      minY
#>   <fct> <dbl>
#> 1 A         1


# does not work when converted to tibble
dt %>%
  as_tibble() %>%
  filter(X == "A") %>%
  group_by(X) %>%
  summarise(minY = min(Y))
#> # A tibble: 1 x 2
#>   X      minY
#>   <fct> <dbl>
#> 1 A         1

# works when data being supplied is a character
dt %>%
  mutate(X = as.character(X)) %>%
  filter(X == "A") %>%
  group_by(X) %>%
  summarise(minY = min(Y))
#> # A tibble: 1 x 2
#>   X      minY
#>   <chr> <dbl>
#> 1 A         1

# works when data is originally a tibble
tibble(X = dt$X, Y = dt$Y) %>%
  filter(X == "A") %>%
  group_by(X) %>%
  summarise(minY = min(Y))
#> # A tibble: 1 x 2
#>   X      minY
#>   <fct> <dbl>
#> 1 A         1

Created on 2019-01-10 by the reprex package (v0.2.1)

@schloerke schloerke changed the title group_by() on a group_by() and a filter() fails when using a factor column in a data.frame Jan 10, 2019
schloerke added a commit to rstudio/shinyloadtest that referenced this issue Jan 10, 2019
@romainfrancois
Copy link
Member

This is a duplicate to #4061 and dealt with in the #4091 pull request. Once the pr is merged, you'll be able to use this:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
packageVersion("dplyr")
#> [1] '0.8.0'
packageVersion("tibble")
#> [1] '2.0.0'

dt <- data.frame(
  X = c("A", "B", "C", "A", "B", "C"),
  Y = 1:6,
  stringsAsFactors = TRUE
)

# Produces extra rows for combinations that don't exist in the filtered data
dt %>%
  filter(X == "A") %>%
  group_by(X, .drop = TRUE) %>%
  summarise(minY = min(Y))
#> # A tibble: 1 x 2
#>   X      minY
#>   <fct> <dbl>
#> 1 A         1

Created on 2019-01-11 by the reprex package (v0.2.1.9000)

wch pushed a commit to rstudio/shinyloadtest that referenced this issue May 13, 2019
* add helper methods to work around dev dplyr bug

tidyverse/dplyr#4096

* use dplyr PR that fixes group_by bug

Depending on tidyverse/dplyr#4091 to be merged

* use new `.drop = TRUE` argument for `group_by` and remove all `convert_*` methods.

* use ungroup and group(run = droplevels(run)) or similar where appropriate

works with dplyr 0.7.8 and 0.8.0

* use cran dplyr >= v0.8.0.1 and not the remote

* remove dplyr.R file

* white space

* undo unnecessary changes

* remove notion of  from plotting.R

* clean up run = run in group_by
@lock
Copy link

lock bot commented Jul 10, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Jul 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants