Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fct_lump all with count above/below threshold #142

Closed
RoyalTS opened this issue Jul 24, 2018 · 3 comments
Closed

fct_lump all with count above/below threshold #142

RoyalTS opened this issue Jul 24, 2018 · 3 comments

Comments

@RoyalTS
Copy link

@RoyalTS RoyalTS commented Jul 24, 2018

It would be great if fct_lump() had an additional mode that lumped factor levels not based on the final number of factor levels desired (n) or the proportion of the data that accrues to the level (prop) but based on the raw count for the level (call it count?).

Stealing the example data from #43:

N <- 5
M <- 2 ** N
exp_factor <- factor(rep(1:(N + M), c(2 ** (N:1), rep(1, M))))
table(exp_factor)
## exp_factor
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 
## 32 16  8  4  2  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 
## 26 27 28 29 30 31 32 33 34 35 36 37 
##  1  1  1  1  1  1  1  1  1  1  1  1

Desired behavior: forcats::fct_lump(exp_factor, count = 1) would retain only the factor levels 1:5 and Other.

@AshesITR
Copy link

@AshesITR AshesITR commented Nov 17, 2018

That's a good idea. Maybe naming the argument min_count with comparison >= makes the functionality clearer...

@hadley
Copy link
Member

@hadley hadley commented Jan 4, 2019

I think this is a reasonable request, but my main concern is making fct_lump() too complicated. It might be time to start spinning out fct_lump_prop() etc.

@robinsones
Copy link
Contributor

@robinsones robinsones commented Jan 19, 2019

👋 I'll try it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants