Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`n` argument for `step_other` #289

Closed
brshallo opened this issue Feb 28, 2019 · 2 comments
Closed

`n` argument for `step_other` #289

brshallo opened this issue Feb 28, 2019 · 2 comments

Comments

@brshallo
Copy link

@brshallo brshallo commented Feb 28, 2019

Would like if threshold argument within step_other() could be specified by an integer (rather than just by proportion), or had a separate n argument to specify the minimum sample size before it is collapsed into the "other" category.
E.g. to specify minimum sample size of 30:

recipe(price ~ clarity + color + carat, data = diamonds) %>% 
  step_other(all_nominal(), n = 30)

I'm curious what the best way of doing this is currently? (Link to hack for specifying consistent sample size to step_other across datasets of different sizes.)

@brshallo
Copy link
Author

@brshallo brshallo commented May 28, 2019

Using a step_mutate() with forcats::fct_lump_min() it seems would accomplish this. Though is there a way to apply step_mutate() over all_nominal(), in the way one might with mutate_if()/_at()/_all() ?

@topepo
Copy link
Collaborator

@topepo topepo commented Jun 28, 2019

Instead of adding a new argument, we could just use threshold and, if it is >1, treat it as a frequency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

2 participants
You can’t perform that action at this time.