Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fct_lump() shouldn't relabel if no lumping occurs #130

Closed
ahaque-utd opened this issue May 30, 2018 · 6 comments

Comments

@ahaque-utd
Copy link

commented May 30, 2018

If in a vector, only one value appears less than the 'prop' times, fct_lump creates a new level called 'other'. Ideally, it should keep the original level name as only one level was affected.

Example:
nRows <- 500
vec <- as.factor(c(rep("X",0.32nRows),rep("Y",0.08nRows), rep("Z",0.4nRows), rep('W', 0.2nRows)))
rebinned_vec <- fct_lump(vec, prop = 0.1)

prop.table(table(rebinned_vec)) gives the following output:
W X Z Other
0.20 0.32 0.40 0.08

In the above code, only the level 'Y' should be affected as it has less than 10% share. But since this is the only level affected, isn't it expected that fct_lump will leave the level 'Y' as it is rather than creating the 'other' level?

@hadley hadley added the reprex label Jan 4, 2019

@hadley

This comment was marked as resolved.

Copy link
Member

commented Jan 4, 2019

Can you please provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you!

If you've never heard of a reprex before, start by reading "What is a reprex", and follow the advice further down the page. Please make sure your reprex is created with the reprex package as it gives nicely formatted output and avoids a number of common pitfalls.

@davidbody

This comment was marked as resolved.

Copy link

commented Jan 19, 2019

I'll create a reprex for this.

@davidbody

This comment has been minimized.

Copy link

commented Jan 19, 2019

library(forcats)

x <- as_factor(c("apple", "apple", "apple", "banana", "banana", "orange"))
fct_lump(x, 2)
#> [1] apple  apple  apple  banana banana Other 
#> Levels: apple banana Other

Created on 2019-01-19 by the reprex package (v0.2.1)

@zhiiiyang

This comment has been minimized.

Copy link
Contributor

commented Jan 19, 2019

Here is the reprex!

nRows <- 500
vec <- as.factor(c(rep("X",0.32*nRows),
                   rep("Y",0.08*nRows), 
                   rep("Z",0.4*nRows), 
                   rep('W', 0.2*nRows)))
rebinned_vec <- forcats::fct_lump(vec, prop = 0.1)

prop.table(table(rebinned_vec)) 
#> rebinned_vec
#>     W     X     Z Other 
#>  0.20  0.32  0.40  0.08
#> rebinned_vec
#>     W     X     Z Other 
#>  0.20  0.32  0.40  0.08

Created on 2019-01-19 by the reprex package (v0.2.1)

@hadley

This comment has been minimized.

Copy link
Member

commented Jan 19, 2019

Thanks for the reprexes! Do either of you want to take a stab at fixing this issue?

@hadley hadley added feature and removed reprex labels Jan 19, 2019

@hadley hadley changed the title Issue with fct_lump rebinning fct_lump() shouldn't relabel if no lumping occurs Jan 19, 2019

@zhiiiyang

This comment has been minimized.

Copy link
Contributor

commented Jan 19, 2019

Thanks for the reprexes! Do either of you want to take a stab at fixing this issue?

Working on it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.