step_dummy_multi_choice()
ignores levels of the input factor
#916
Labels
bug
an unexpected problem or unintended behavior
The problem
step_dummy_multi_choice()
ignores levels of the input factor and only considers levels in the new data. This causes problem if the downstream model uses a dummy variable for a level that does not occur in the new data. This behavior is in contrast to that ofstep_dummy()
.I also notice that the two functions have a common parameter called
levels
, and it looks like it is intended to explicitly specify the levels to include in the output. However, there are no examples in the documentation and I don't know how to specify (I triedlevels = list(x = c("a", "b", "c", "d", "e", "f", "g"))
but it failed). Can you add some more explanations on how to use the parameter and/or add an example?Reproducible example
I would expect the following cases of
step_dummy()
andstep_dummy_multi_choice()
to produce the same number of columns. However,step_dummy_multi_choice()
does not create columns for levels"a", "b", "f", "g"
.By the way, why do
step_dummy()
andstep_dummy_multi_choice()
produce different types of output (double vs integer)?Created on 2022-02-24 by the reprex package (v2.0.0)
The text was updated successfully, but these errors were encountered: