New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fct_lump all with count above/below threshold #142

Closed
RoyalTS opened this Issue Jul 24, 2018 · 3 comments

Comments

Projects
None yet
4 participants
@RoyalTS
Copy link

RoyalTS commented Jul 24, 2018

It would be great if fct_lump() had an additional mode that lumped factor levels not based on the final number of factor levels desired (n) or the proportion of the data that accrues to the level (prop) but based on the raw count for the level (call it count?).

Stealing the example data from #43:

N <- 5
M <- 2 ** N
exp_factor <- factor(rep(1:(N + M), c(2 ** (N:1), rep(1, M))))
table(exp_factor)
## exp_factor
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 
## 32 16  8  4  2  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 
## 26 27 28 29 30 31 32 33 34 35 36 37 
##  1  1  1  1  1  1  1  1  1  1  1  1

Desired behavior: forcats::fct_lump(exp_factor, count = 1) would retain only the factor levels 1:5 and Other.

@AshesITR

This comment has been minimized.

Copy link

AshesITR commented Nov 17, 2018

That's a good idea. Maybe naming the argument min_count with comparison >= makes the functionality clearer...

@hadley

This comment has been minimized.

Copy link
Member

hadley commented Jan 4, 2019

I think this is a reasonable request, but my main concern is making fct_lump() too complicated. It might be time to start spinning out fct_lump_prop() etc.

@robinsones

This comment has been minimized.

Copy link
Contributor

robinsones commented Jan 19, 2019

👋 I'll try it

This was referenced Jan 19, 2019

@hadley hadley closed this in #166 Jan 23, 2019

hadley added a commit that referenced this issue Jan 23, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment