Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
fct_lump() shouldn't relabel if no lumping occurs #130
If in a vector, only one value appears less than the 'prop' times, fct_lump creates a new level called 'other'. Ideally, it should keep the original level name as only one level was affected.
prop.table(table(rebinned_vec)) gives the following output:
In the above code, only the level 'Y' should be affected as it has less than 10% share. But since this is the only level affected, isn't it expected that fct_lump will leave the level 'Y' as it is rather than creating the 'other' level?
Can you please provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you!
If you've never heard of a reprex before, start by reading "What is a reprex", and follow the advice further down the page. Please make sure your reprex is created with the reprex package as it gives nicely formatted output and avoids a number of common pitfalls.
Here is the reprex!
nRows <- 500 vec <- as.factor(c(rep("X",0.32*nRows), rep("Y",0.08*nRows), rep("Z",0.4*nRows), rep('W', 0.2*nRows))) rebinned_vec <- forcats::fct_lump(vec, prop = 0.1) prop.table(table(rebinned_vec)) #> rebinned_vec #> W X Z Other #> 0.20 0.32 0.40 0.08 #> rebinned_vec #> W X Z Other #> 0.20 0.32 0.40 0.08
Created on 2019-01-19 by the reprex package (v0.2.1)