Skip to content

Keep original labels in step_discretize#951

Merged
topepo merged 4 commits intomainfrom
discretize-bins
May 4, 2022
Merged

Keep original labels in step_discretize#951
topepo merged 4 commits intomainfrom
discretize-bins

Conversation

@EmilHvitfeldt
Copy link
Copy Markdown
Member

This PR aims to close #674.

It does 2 things

  • makes it so when prefix = NULL is set in discretize(), that the factor levels match what happens in cut(). It essentially disable the custom labeling.
  • Sets the default value of prefix in discretize() to NULL.

The old behavior is still possible by setting prefix to "bin". It think that changing the default is worthwhile since it aligns it better with what happens in the {embed} steps.

library(recipes)
library(modeldata)
data(biomass)

recipe(HHV ~ carbon, data = biomass) %>% 
  step_discretize(carbon) %>%
  prep() %>%
  bake(new_data = NULL)
#> # A tibble: 536 × 2
#>    carbon        HHV
#>    <fct>       <dbl>
#>  1 (49.7, Inf]  20.0
#>  2 (47.1,49.7]  19.2
#>  3 (47.1,49.7]  18.3
#>  4 (44.7,47.1]  18.2
#>  5 (44.7,47.1]  18.4
#>  6 (44.7,47.1]  18.5
#>  7 (47.1,49.7]  18.7
#>  8 (44.7,47.1]  18.3
#>  9 (47.1,49.7]  18.6
#> 10 (44.7,47.1]  18.9
#> # … with 526 more rows

recipe(HHV ~ carbon, data = biomass) %>% 
  step_discretize(carbon, options = list(prefix = "bin")) %>%
  prep() %>%
  bake(new_data = NULL)
#> # A tibble: 536 × 2
#>    carbon   HHV
#>    <fct>  <dbl>
#>  1 bin4    20.0
#>  2 bin3    19.2
#>  3 bin3    18.3
#>  4 bin2    18.2
#>  5 bin2    18.4
#>  6 bin2    18.5
#>  7 bin3    18.7
#>  8 bin2    18.3
#>  9 bin3    18.6
#> 10 bin2    18.9
#> # … with 526 more rows

Created on 2022-04-13 by the reprex package (v2.0.1)

Copy link
Copy Markdown
Member

@topepo topepo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also add a keep_original_cols and error if the prefix is NULL

@EmilHvitfeldt
Copy link
Copy Markdown
Member Author

Which function should error if prefix = NULL?

@topepo
Copy link
Copy Markdown
Member

topepo commented Apr 19, 2022

if (keep_original_cols && is.null(prefix), the bake method should error.

@EmilHvitfeldt
Copy link
Copy Markdown
Member Author

I don't think that keep_original_cols fits for this step. I wrote up something a little longer here #929 (comment). In essence; so far we only used keep_original_cols in steps that generate new columns. This step doesn't do that.

@topepo topepo merged commit 0c43719 into main May 4, 2022
@topepo topepo deleted the discretize-bins branch May 4, 2022 16:54
@github-actions
Copy link
Copy Markdown
Contributor

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions Bot locked and limited conversation to collaborators May 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Keep original labels in step_discretize

2 participants