Closed
Description
I did a speed comparison of the Yeo-Johnson transformation functions from {recipes}
and {bestNormalize}
packages. It seems there is some overhead in {recipes}
package (especially for large datasets) and I wonder if this can be optimized in some way.
library("bestNormalize")
library("recipes")
set.seed(123)
df = data.frame(x = rgamma(20000000, 1, 1))
### bestNormalize
system.time({
x1 = yeojohnson(df$x, standardize = FALSE, lower = -5, upper = 5)
})
#> user system elapsed
#> 60.06 6.31 66.38
hist(x1[["x.t"]])
### recipes
rec = recipe( ~ ., data = df)
rec = step_YeoJohnson(rec, x)
system.time({
estimates = prep(rec, training = df, retain = TRUE)
x2 = bake(estimates, new_data = NULL)
})
#> user system elapsed
#> 68.74 10.11 78.86
hist(x2[["x"]])
Metadata
Metadata
Assignees
Labels
No labels