Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fct_lump() shouldn't relabel if no lumping occurs #130

Closed
ahaque-utd opened this issue May 30, 2018 · 6 comments
Closed

fct_lump() shouldn't relabel if no lumping occurs #130

ahaque-utd opened this issue May 30, 2018 · 6 comments
Labels

Comments

@ahaque-utd
Copy link

@ahaque-utd ahaque-utd commented May 30, 2018

If in a vector, only one value appears less than the 'prop' times, fct_lump creates a new level called 'other'. Ideally, it should keep the original level name as only one level was affected.

Example:
nRows <- 500
vec <- as.factor(c(rep("X",0.32nRows),rep("Y",0.08nRows), rep("Z",0.4nRows), rep('W', 0.2nRows)))
rebinned_vec <- fct_lump(vec, prop = 0.1)

prop.table(table(rebinned_vec)) gives the following output:
W X Z Other
0.20 0.32 0.40 0.08

In the above code, only the level 'Y' should be affected as it has less than 10% share. But since this is the only level affected, isn't it expected that fct_lump will leave the level 'Y' as it is rather than creating the 'other' level?

@hadley hadley added the reprex label Jan 4, 2019
@hadley

This comment has been hidden.

@davidbody

This comment has been hidden.

@davidbody
Copy link

@davidbody davidbody commented Jan 19, 2019

library(forcats)

x <- as_factor(c("apple", "apple", "apple", "banana", "banana", "orange"))
fct_lump(x, 2)
#> [1] apple  apple  apple  banana banana Other 
#> Levels: apple banana Other

Created on 2019-01-19 by the reprex package (v0.2.1)

Loading

@zhiiiyang
Copy link
Contributor

@zhiiiyang zhiiiyang commented Jan 19, 2019

Here is the reprex!

nRows <- 500
vec <- as.factor(c(rep("X",0.32*nRows),
                   rep("Y",0.08*nRows), 
                   rep("Z",0.4*nRows), 
                   rep('W', 0.2*nRows)))
rebinned_vec <- forcats::fct_lump(vec, prop = 0.1)

prop.table(table(rebinned_vec)) 
#> rebinned_vec
#>     W     X     Z Other 
#>  0.20  0.32  0.40  0.08
#> rebinned_vec
#>     W     X     Z Other 
#>  0.20  0.32  0.40  0.08

Created on 2019-01-19 by the reprex package (v0.2.1)

Loading

@hadley
Copy link
Member

@hadley hadley commented Jan 19, 2019

Thanks for the reprexes! Do either of you want to take a stab at fixing this issue?

Loading

@hadley hadley added feature and removed reprex labels Jan 19, 2019
@hadley hadley changed the title Issue with fct_lump rebinning fct_lump() shouldn't relabel if no lumping occurs Jan 19, 2019
@zhiiiyang
Copy link
Contributor

@zhiiiyang zhiiiyang commented Jan 19, 2019

Thanks for the reprexes! Do either of you want to take a stab at fixing this issue?

Working on it now.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

4 participants