group_by() and a filter() fails when using a factor column in a data.frame #4096

schloerke · 2019-01-10T15:50:56Z

When using dplyr v0.7.8, there are no issues.

Using dplyr v0.8.0.9000, data.frames cause issues when grouped and then filtered. The missing levels found within the data.frame are creating unwanted combinations in the final result.

Using a tibble from the beginning does not cause an issue. In my case, it is useful to preserve the levels to use at a later time.

This was originally found using a dplyr::mutate command.

- Barret

Broken

Missing warning statements within the reprex that reference (converted from warning) no non-missing arguments to min; returning Inf.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
packageVersion("dplyr")
#> [1] '0.8.0.9000'
packageVersion("tibble")
#> [1] '2.0.0'

dt <- data.frame(
  X = c("A", "B", "C", "A", "B", "C"),
  Y = 1:6,
  stringsAsFactors = TRUE
)

# I expect
dt[1,] %>%
  rename(minY = Y)
#>   X minY
#> 1 A    1

# Produces extra rows for combinations that don't exist in the filtered data
dt %>%
  filter(X == "A") %>%
  group_by(X) %>%
  summarise(minY = min(Y))
#> # A tibble: 3 x 2
#>   X      minY
#>   <fct> <dbl>
#> 1 A         1
#> 2 B       Inf
#> 3 C       Inf


# does not work when converted to tibble
dt %>%
  as_tibble() %>%
  filter(X == "A") %>%
  group_by(X) %>%
  summarise(minY = min(Y))
#> # A tibble: 3 x 2
#>   X      minY
#>   <fct> <dbl>
#> 1 A         1
#> 2 B       Inf
#> 3 C       Inf

# works when data being supplied is a character
dt %>%
  mutate(X = as.character(X)) %>%
  filter(X == "A") %>%
  group_by(X) %>%
  summarise(minY = min(Y))
#> # A tibble: 1 x 2
#>   X      minY
#>   <chr> <dbl>
#> 1 A         1

# works when data is originally a tibble
tibble(X = dt$X, Y = dt$Y) %>%
  filter(X == "A") %>%
  group_by(X) %>%
  summarise(minY = min(Y))
#> # A tibble: 3 x 2
#>   X      minY
#>   <fct> <dbl>
#> 1 A         1
#> 2 B       Inf
#> 3 C       Inf

^{Created on 2019-01-10 by the reprex package (v0.2.1)}

Working on CRAN

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
packageVersion("dplyr")
#> [1] '0.7.8'
packageVersion("tibble")
#> [1] '2.0.0'

dt <- data.frame(
  X = c("A", "B", "C", "A", "B", "C"),
  Y = 1:6,
  stringsAsFactors = TRUE
)

# I expect
dt[1,] %>%
  rename(minY = Y)
#>   X minY
#> 1 A    1

# Produces extra rows for combinations that don't exist in the filtered data
dt %>%
  filter(X == "A") %>%
  group_by(X) %>%
  summarise(minY = min(Y))
#> # A tibble: 1 x 2
#>   X      minY
#>   <fct> <dbl>
#> 1 A         1


# does not work when converted to tibble
dt %>%
  as_tibble() %>%
  filter(X == "A") %>%
  group_by(X) %>%
  summarise(minY = min(Y))
#> # A tibble: 1 x 2
#>   X      minY
#>   <fct> <dbl>
#> 1 A         1

# works when data being supplied is a character
dt %>%
  mutate(X = as.character(X)) %>%
  filter(X == "A") %>%
  group_by(X) %>%
  summarise(minY = min(Y))
#> # A tibble: 1 x 2
#>   X      minY
#>   <chr> <dbl>
#> 1 A         1

# works when data is originally a tibble
tibble(X = dt$X, Y = dt$Y) %>%
  filter(X == "A") %>%
  group_by(X) %>%
  summarise(minY = min(Y))
#> # A tibble: 1 x 2
#>   X      minY
#>   <fct> <dbl>
#> 1 A         1

^{Created on 2019-01-10 by the reprex package (v0.2.1)}

The text was updated successfully, but these errors were encountered:

tidyverse/dplyr#4096

romainfrancois · 2019-01-11T15:59:42Z

This is a duplicate to #4061 and dealt with in the #4091 pull request. Once the pr is merged, you'll be able to use this:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
packageVersion("dplyr")
#> [1] '0.8.0'
packageVersion("tibble")
#> [1] '2.0.0'

dt <- data.frame(
  X = c("A", "B", "C", "A", "B", "C"),
  Y = 1:6,
  stringsAsFactors = TRUE
)

# Produces extra rows for combinations that don't exist in the filtered data
dt %>%
  filter(X == "A") %>%
  group_by(X, .drop = TRUE) %>%
  summarise(minY = min(Y))
#> # A tibble: 1 x 2
#>   X      minY
#>   <fct> <dbl>
#> 1 A         1

^{Created on 2019-01-11 by the reprex package (v0.2.1.9000)}

* add helper methods to work around dev dplyr bug tidyverse/dplyr#4096 * use dplyr PR that fixes group_by bug Depending on tidyverse/dplyr#4091 to be merged * use new `.drop = TRUE` argument for `group_by` and remove all `convert_*` methods. * use ungroup and group(run = droplevels(run)) or similar where appropriate works with dplyr 0.7.8 and 0.8.0 * use cran dplyr >= v0.8.0.1 and not the remote * remove dplyr.R file * white space * undo unnecessary changes * remove notion of from plotting.R * clean up run = run in group_by

lock · 2019-07-10T16:27:51Z

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

schloerke changed the title ~~group_by() on a~~ group_by() and a filter() fails when using a factor column in a data.frame Jan 10, 2019

schloerke added a commit to rstudio/shinyloadtest that referenced this issue Jan 10, 2019

add helper methods to work around dev dplyr bug

7d499c4

tidyverse/dplyr#4096

schloerke mentioned this issue Jan 10, 2019

Bump dplyr version and whitelist events to be plotted rstudio/shinyloadtest#68

Merged

romainfrancois closed this as completed Jan 11, 2019

romainfrancois mentioned this issue Jan 23, 2019

Replace data_frame() by tibble() #4117

Merged

lock bot locked and limited conversation to collaborators Jul 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

group_by() and a filter() fails when using a factor column in a data.frame #4096

group_by() and a filter() fails when using a factor column in a data.frame #4096

schloerke commented Jan 10, 2019

romainfrancois commented Jan 11, 2019

lock bot commented Jul 10, 2019

group_by() and a filter() fails when using a factor column in a data.frame #4096

group_by() and a filter() fails when using a factor column in a data.frame #4096

Comments

schloerke commented Jan 10, 2019

Broken

Working on CRAN

romainfrancois commented Jan 11, 2019

lock bot commented Jul 10, 2019