You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For columns with zero variance, step_normalize() replaces the whole column with NaN's. I had a very hard time debugging this when trying to feed the data to a model and got an Error: Missing data in columns:.
Reproducible example
x1<- sample(c(NaN,5,10),300,replace=TRUE)
x2<- sample(1,300,replace=TRUE)
x3<- sample(c(0,1),300,replace=TRUE)
df1<-data.frame(x1,x2,x3)
dfsplits<- initial_split(df1, strata=x3)
df_train<- training(dfsplits)
df_test<- testing(dfsplits)
df_recipe<-
recipe(x1~., data=df_train) %>%
step_normalize(all_numeric_predictors())
df_recipe %>%
prep(df_train) %>%
bake(df_test)
# A tibble: 76 x 3x2x3x1<dbl><dbl><dbl>1NaN0.94652NaN-1.05NaN3NaN0.94654NaN-1.0555NaN-1.05106NaN0.946107NaN0.946108NaN0.946109NaN0.946NaN10NaN-1.0510# ... with 66 more rows
I now know that this can be mitigated for this particular case with step_zv() to remove the zero-variance column, but it would be great if there had been any indication that step_normalize() introduced the NaN values.
The text was updated successfully, but these errors were encountered:
we could check to see if any are ~zero, like sds < .Machine$double.eps, and throw an error saying which one is. (We could suggest using step_zv() too.)
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.
Background
For columns with zero variance,
step_normalize()
replaces the whole column withNaN
's. I had a very hard time debugging this when trying to feed the data to a model and got anError: Missing data in columns:
.Reproducible example
I now know that this can be mitigated for this particular case with
step_zv()
to remove the zero-variance column, but it would be great if there had been any indication thatstep_normalize()
introduced theNaN
values.The text was updated successfully, but these errors were encountered: