-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep original labels in step_discretize
#674
Comments
You can get those values out using the library(recipes)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#>
#> step
data(biomass, package = "modeldata")
biomass_tr <- biomass[biomass$dataset == "Training",]
biomass_te <- biomass[biomass$dataset == "Testing",]
rec <- recipe(HHV ~ carbon,
data = biomass_tr) %>%
step_discretize(carbon)
rec <- prep(rec, biomass_tr)
binned_te <- bake(rec, biomass_te)
table(binned_te$carbon)
#>
#> bin_missing bin1 bin2 bin3 bin4
#> 0 22 17 25 16
tidy(rec, 1)
#> # A tibble: 5 x 3
#> terms value id
#> <chr> <dbl> <chr>
#> 1 carbon -Inf discretize_gclTQ
#> 2 carbon 44.7 discretize_gclTQ
#> 3 carbon 47.1 discretize_gclTQ
#> 4 carbon 49.7 discretize_gclTQ
#> 5 carbon Inf discretize_gclTQ Created on 2021-03-29 by the reprex package (v1.0.0) You can read more about tidying a recipe here. Can you say more about your use case for wanting factor levels like that in your output? |
Hi, Julia, thanks for the reply! I didn't know that you can tidy the recipe and get these values! But to be more clear in my use case, it would be awesome if, after you bake a recipe with new data, you could keep the label of the "step_discretize" steps instead of "bin01", "bin02", "bin03", etc., with the labels being similar of the ones generated by the "cut" function. This would facilitate the EDA exploration/understanding of the data after baking. Is there a workaround for this? Minimal reprex:
Tibble generated: |
Also related with #157. If |
I realized that the behaviour of Minimal reprex:
Which returns: The downside of this, is that it does not necessarily return bins with roughly uniform frequencies. |
discretize
step_discretize
Hello, folks, is there some ongoing effort of this improvement? Cheers! |
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue. |
Hi there!
I'd like to ask for a feature that would keep the original labels generated by the internal
cut
function in discretize, instead of "bin1", bin2", etc. Perhaps adding an argumentkeep_cut_labels = TRUE
, for example.Minimal Reproducible Example:
Current Behaviour:
Expected behaviour:
The text was updated successfully, but these errors were encountered: