Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the output of fct_collapse(..., `group_other = TRUE) seems wrong order. #219

Closed
cuttlefish44 opened this issue Nov 8, 2019 · 5 comments
Closed

Comments

@cuttlefish44
Copy link

When I use fct_collapse with group_other = TRUE, the output are not in correct order. Is this by design ?

library(forcats)

factor_vec <- as.factor(LETTERS[c(5,1,2,4,3)])

a <- fct_collapse(factor_vec,
                  BD = c("B", "D"),
                  ACE = c("A", "C", "E"))

b <- fct_collapse(factor_vec,
                  BD = c("B", "D"),
                  AC = c("A", "C"),
                  group_other = TRUE)

data.frame(origin = factor_vec, all_lev_named = a, with_group_other = b)
#>   origin all_lev_named with_group_other
#> 1      E           ACE            Other
#> 2      A           ACE               BD
#> 3      B            BD               BD
#> 4      D            BD               AC
#> 5      C           ACE               AC

Created on 2019-11-08 by the reprex package (v0.3.0)

@xjlc
Copy link

xjlc commented Nov 16, 2019

I agree, also encountered this problem.

For the maintainers: since fct_other seems to work, couldn't that be used as a fix?

I'm not deep enough in the code to fix it myself and don't really have the time to test this, but Instead of

if (group_other) {
    f <- check_factor(.f)
    levels <- levels(f)
    new[["Other"]] <- levels[!levels %in% levs]
    levs <- c(levs, new[["Other"]])
}
...
if (group_other) {
    fct_relevel(out, "Other", after = Inf)
  } else {
    out
 }

something like this (maybe with the conditional moved to the top of the function?)

 if (group_other) {
   .f <- fct_other(.f)
}
...
out

@mjanson
Copy link

mjanson commented Jan 16, 2020

Sorry if this is unrelated, but I figured it would be better to ask here first before opening a new issue:

> titles$Title
 [1] Mr     Mrs    Miss   Mrs    Mr     Mr     Mr     Master Mrs    Mrs   
Levels: Master Miss Mr Mrs
> fct_collapse(titles$Title, Mr = "Mr", group_other=TRUE)
 [1] Other Other Other Other Other Other Other Mr    Other Other
Levels: Mr Other
> packageVersion("forcats")
[1] ‘0.4.0’
>

@strazto
Copy link

strazto commented Jan 21, 2020

Weirdly enough, I remember seeing this a while ago, in #172 , and #202 .

We see it "fixed" (as of 0.4.0, according to the changelog apparently ) here:

34ad9e5#diff-8312ad0561ef661716b48d09478362f3

forcats/NEWS.md

Lines 1 to 13 in 34ad9e5

# forcats (development version)
* `first2()`, a `fct_reorder2()` helper function, sorts `.y` by the first value of `.x` (@jtr13).
# forcats 0.4.0
## New features
* `fct_collapse()` gains a `group_other` argument to allow you to group all
un-named levels into `"Other"`. (#100, @AmeliaMN)
* fixed bug in `fct_collapse()` so it now correctly collapses factors when `group_other = TRUE` (#172), and makes `"Other"` the last level (#202) (@gtm19, #172 & #202)

That change was pushed in september, however, and the cran release of forcats 0.4.0 was: 2019-02-17 .

My version of forcats

If examine my install of forcats with packageVersion, I see that it's coming from the development branch:

> packageVersion("forcats")
[1] ‘0.4.0.9000

If I probe into the changelog, I notice that it is apparently "fixed"

> system.file("NEWS.md", package = "forcats") %>% readLines() %>% cat(sep = "\n")

State of NEWS.md for my installed version of forcats

forcats (development version)

  • first2(), a fct_reorder2() helper function, sorts .y by the first value of .x (@jtr13).

forcats 0.4.0

New features


@strazto
Copy link

strazto commented Jan 21, 2020

This is a weirdly hard bug to reproduce - I get the same (incorrect) output when I run @cuttlefish44 's reprex, but when I try to reproduce this with other examples, they sometimes work correctly

gtm19 added a commit to gtm19/forcats that referenced this issue Jan 21, 2020
Moving the bullet point referencing tidyverse#172 and tidyverse#202, as it was put under 0.4.0 heading by mistake -- it was not part of that release. Raised in issue tidyverse#219.
@gtm19
Copy link
Contributor

gtm19 commented Jan 21, 2020

@cuttlefish44 @xjlc @mjanson: this bug is fixed in the development version of forcats:

library(forcats)

packageVersion("forcats")
#> [1] '0.4.0.9000'

factor_vec <- as.factor(LETTERS[c(5,1,2,4,3)])

a <- fct_collapse(factor_vec,
                  BD = c("B", "D"),
                  ACE = c("A", "C", "E"))

b <- fct_collapse(factor_vec,
                  BD = c("B", "D"),
                  AC = c("A", "C"),
                  group_other = TRUE)

data.frame(origin = factor_vec, all_lev_named = a, with_group_other = b)
#>   origin all_lev_named with_group_other
#> 1      E           ACE            Other
#> 2      A           ACE               AC
#> 3      B            BD               BD
#> 4      D            BD               BD
#> 5      C           ACE               AC

Created on 2020-01-21 by the reprex package (v0.3.0)

You can install as follows:

devtools::install_github("tidyverse/forcats")

@mstr3336 As you can see, I get the correct output using forcats 0.4.0.9000, but the incorrect output with 0.4.0. The issue was fixed in the development version, but due to some confusion as (on my part) as to whether it would make it into 0.4.0, I added the NEWS item under the wrong heading. It should be under forcats (development version). Apologies. I have created a pull request (#229) to fix this.

@hadley hadley closed this as completed in 7801988 Jan 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants