You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Operations:
Collapsing factor levels for Status, Marital, Home [trained]
However, the variable Status was not actually binned according to other, as it has only two values well above the threshold. Upon inspection of the steps, you will find that the variable Status has the attribute collapse = TRUE.
othered$steps[[1]]$objects$Status
$keep
[1] "bad" "good"
$collapse
[1] TRUE
$other
[1] "my_other"
Upon tracing through the code, I discovered the collapse flag is always set to true, when it should be dynamically evaluated.
recipes:::keep_levels
function (x, prop = 0.1, other = "other")
{
if (!is.factor(x))
x <- factor(x)
xtab <- sort(table(x, useNA = "no"), decreasing = TRUE)/sum(!is.na(x))
dropped <- which(xtab < prop)
orig <- levels(x)
if (length(dropped) > 0)
keepers <- names(xtab[-dropped])
else keepers <- orig
if (length(keepers) == 0)
keepers <- names(xtab)[which.max(xtab)]
if (other %in% keepers)
stop("The level ", other, " is already a factor level that will be retained. ",
"Please choose a different value.", call. = FALSE)
list(keep = orig[orig %in% keepers], collapse = TRUE, other = other)
}
The issue is in the last line of the function. Can we add in a collapse=all(orig %in% keepers) instead, or something similar?
The text was updated successfully, but these errors were encountered:
Previously, if step_other() did not collapse any levels, it would still add an "other" level to the factor. This would lump new factor levels into "other" when data were baked (as step_novel() does). This no longer occurs since it was inconsistent with ?step_other, which said that
step_other()'s print method only reports the variables with collapsed levels (as opposed to any column that was tested to see if it needed collapsing).
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.
Minimal, reproducible example:
This will yield
However, the variable Status was not actually binned according to other, as it has only two values well above the threshold. Upon inspection of the steps, you will find that the variable Status has the attribute collapse = TRUE.
Upon tracing through the code, I discovered the collapse flag is always set to true, when it should be dynamically evaluated.
The issue is in the last line of the function. Can we add in a collapse=all(orig %in% keepers) instead, or something similar?
The text was updated successfully, but these errors were encountered: