-
Notifications
You must be signed in to change notification settings - Fork 123
Closed
Labels
featurea feature request or enhancementa feature request or enhancement
Description
Background
For columns with zero variance, step_normalize() replaces the whole column with NaN's. I had a very hard time debugging this when trying to feed the data to a model and got an Error: Missing data in columns:.
Reproducible example
x1 <- sample(c(NaN,5,10),300,replace=TRUE)
x2 <- sample(1,300,replace=TRUE)
x3 <- sample(c(0,1),300,replace=TRUE)
df1 <- data.frame(x1,x2,x3)
dfsplits <- initial_split(df1, strata = x3)
df_train <- training(dfsplits)
df_test <- testing(dfsplits)
df_recipe <-
recipe(x1 ~ ., data = df_train) %>%
step_normalize(all_numeric_predictors())
df_recipe %>%
prep(df_train) %>%
bake(df_test)
# A tibble: 76 x 3
x2 x3 x1
<dbl> <dbl> <dbl>
1 NaN 0.946 5
2 NaN -1.05 NaN
3 NaN 0.946 5
4 NaN -1.05 5
5 NaN -1.05 10
6 NaN 0.946 10
7 NaN 0.946 10
8 NaN 0.946 10
9 NaN 0.946 NaN
10 NaN -1.05 10
# ... with 66 more rowsdf_model <- rand_forest(mtry=50,trees=150) %>%
set_engine('ranger',importance='impurity') %>%
set_mode('regression')
df_workflow <-
workflow() %>%
add_model(df_model) %>%
add_recipe(df_recipe)
df_workflow %>%
fit(data = df_train)
# Error: Missing data in columns: x2.I now know that this can be mitigated for this particular case with step_zv() to remove the zero-variance column, but it would be great if there had been any indication that step_normalize() introduced the NaN values.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
featurea feature request or enhancementa feature request or enhancement