Skip to content

fct_collapse applying collapsed factors in wrong order when group_other = TRUE #172

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gtm19 opened this issue Feb 4, 2019 · 7 comments
Closed
Labels
bug an unexpected problem or unintended behavior

Comments

@gtm19
Copy link
Contributor

gtm19 commented Feb 4, 2019

I'm having an issue which is self explanatory from the title. Reprex below. As you can see, the classification produced by fct_collapse is incorrect.

As requested, a simpler set of examples is below.

Example 1 - Collapsing Character Variables

The fct_collapse function wrongly classifies raspberry as a vegetable, rather than broccoli.

library(forcats)
library(dplyr)
library(tibble)

df <-
  tibble(item = c("apple", "grape", "banana", "broccoli", "raspberry"))

df %>% 
  mutate(category = fct_collapse(item,
                                 fruit = c("apple", "grape", "banana", "raspberry"),
                                 vegetables = "broccoli", group_other = TRUE))
#> # A tibble: 5 x 2
#>   item      category  
#>   <chr>     <fct>     
#> 1 apple     fruit     
#> 2 grape     fruit     
#> 3 banana    fruit     
#> 4 broccoli  fruit     
#> 5 raspberry vegetables

Example 2 - Collapsing _Factor_Variables

For clarity, and contrary to my previous explanation, this bug affects both character and factor variables (since fct_collapse converts the .f argument to a factor if it is a character variable):

df <-
  tibble(item = factor(c("apple", "grape", "banana", "broccoli", "raspberry")))

df %>% 
  mutate(category = fct_collapse(item,
                                 fruit = c("apple", "grape", "banana", "raspberry"),
                                 vegetables = "broccoli", group_other = TRUE))
#> # A tibble: 5 x 2
#>   item      category  
#>   <fct>     <fct>     
#> 1 apple     fruit     
#> 2 grape     fruit     
#> 3 banana    fruit     
#> 4 broccoli  fruit     
#> 5 raspberry vegetables

Example 3 - with group_other argument set to FALSE

Just to demonstrate that this bug does not occur when group_other is set to FALSE:

df <-
  tibble(item = factor(c("apple", "grape", "banana", "broccoli", "raspberry")))

df %>% 
  mutate(category = fct_collapse(item,
                                 fruit = c("apple", "grape", "banana", "raspberry"),
                                 vegetables = "broccoli", group_other = FALSE))
#> # A tibble: 5 x 2
#>   item      category  
#>   <fct>     <fct>     
#> 1 apple     fruit     
#> 2 grape     fruit     
#> 3 banana    fruit     
#> 4 broccoli  vegetables
#> 5 raspberry fruit

Session Info

sessionInfo()
#> R version 3.5.2 (2018-12-20)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 17134)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United Kingdom.1252 
#> [2] LC_CTYPE=English_United Kingdom.1252   
#> [3] LC_MONETARY=English_United Kingdom.1252
#> [4] LC_NUMERIC=C                           
#> [5] LC_TIME=English_United Kingdom.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] bindrcpp_0.2.2     tibble_2.0.1       dplyr_0.7.8       
#> [4] forcats_0.4.0.9000
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.0       knitr_1.21       bindr_0.1.1      magrittr_1.5    
#>  [5] tidyselect_0.2.5 R6_2.3.0         rlang_0.3.1      fansi_0.4.0     
#>  [9] stringr_1.4.0    highr_0.7        tools_3.5.2      xfun_0.4        
#> [13] utf8_1.1.4       cli_1.0.1        htmltools_0.3.6  yaml_2.2.0      
#> [17] digest_0.6.18    assertthat_0.2.0 crayon_1.3.4     purrr_0.3.0     
#> [21] glue_1.3.0       evaluate_0.12    rmarkdown_1.11   stringi_1.2.4   
#> [25] compiler_3.5.2   pillar_1.3.1     pkgconfig_2.0.2

Created on 2019-02-17 by the reprex package (v0.2.1)

@hadley

This comment has been minimized.

@hadley hadley added the reprex needs a minimal reproducible example label Feb 16, 2019
@gtm19 gtm19 changed the title fct_collapse applying collapsed factors in wrong order when input is character vector and group_other = TRUE fct_collapse applying collapsed factors in wrong order when group_other = TRUE Feb 17, 2019
@gtm19

This comment has been minimized.

@hadley

This comment has been minimized.

@hadley hadley added bug an unexpected problem or unintended behavior and removed reprex needs a minimal reproducible example labels Feb 18, 2019
@hadley
Copy link
Member

hadley commented Feb 18, 2019

@AmeliaMN do you want to have a look at this?

@gtm19
Copy link
Contributor Author

gtm19 commented Feb 18, 2019

@AmeliaMN I have added some further comments on the original pull request which might help also.

@avishaitsur
Copy link

The reprex still results in incorrect levels.
Is there a solution?

@gtm19
Copy link
Contributor Author

gtm19 commented Aug 20, 2019

I believe fct_collapse was dropped from 0.4.0, but my pull request #176 contains a fix (should it be resurrected).

@hadley hadley closed this as completed in 34ad9e5 Sep 3, 2019
gtm19 added a commit to gtm19/forcats that referenced this issue Jan 21, 2020
Moving the bullet point referencing tidyverse#172 and tidyverse#202, as it was put under 0.4.0 heading by mistake -- it was not part of that release. Raised in issue tidyverse#219.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants