-
Notifications
You must be signed in to change notification settings - Fork 95
Closed
Description
The problem
I recently tried to implement an ensemble of parsnip model. This is easily done for one particular model (see example for bagging of linear regression below) but things become more difficult as soon as I would like to implement a general model type that can take any parsnip model as the base model. This model should ideally:
- have its own list of arguments/hyperparameters such as number of resamples or number of features tried at each iteration
- expose the hyperparameters of its base model for hyperparameter tuning
Another example besides bagging could be to allow for model stacking of parsnip models.
Is there interest in adding such models or having an ensemble module in parsnip/tidymodels?
Does it possibly exist already or is really easy to implement, but I just missed it?
If yes to the former and no to the latter, do you have any suggestions as to how to best go about this?
Reproducible example
library(dplyr)
library(purrr)
library(rsample)
library(parsnip)
# Implementation of a simple lin-reg bagging routine ---------------------
# Divide data into train and test and re-sample the training set for bagging
set.seed(42)
train_test <- initial_split(mtcars)
btstrp <- bootstraps(training(train_test), times = 5)
# Randomly choose a subset of variables
btstrp <- btstrp %>%
mutate(cols = rerun(n(), sample(2:ncol(mtcars), size = 4, replace = FALSE)))
# Fit a separate linear regression to each sample
lr <- linear_reg() %>% set_engine("lm")
btstrp <- btstrp %>%
mutate(fit = map2(splits, cols, ~ fit(lr, mpg ~ ., analysis(.x)[, c(1, .y)])))
btstrp # Overall collection of models
#> # Bootstrap sampling
#> # A tibble: 5 x 4
#> splits id cols fit
#> * <list> <chr> <list> <list>
#> 1 <split [24/8]> Bootstrap1 <int [4]> <fit[+]>
#> 2 <split [24/10]> Bootstrap2 <int [4]> <fit[+]>
#> 3 <split [24/4]> Bootstrap3 <int [4]> <fit[+]>
#> 4 <split [24/8]> Bootstrap4 <int [4]> <fit[+]>
#> 5 <split [24/8]> Bootstrap5 <int [4]> <fit[+]>
btstrp$fit[[1]] # Fit on one bootstrap sample
#> parsnip model object
#>
#> Fit time: 0ms
#>
#> Call:
#> stats::lm(formula = formula, data = data)
#>
#> Coefficients:
#> (Intercept) qsec am vs hp
#> 38.51572 -0.77183 5.07365 3.20619 -0.05355
# Then make predictions by averaging thepredictions of each submodel
btstrp <- btstrp %>%
mutate(pred = map(fit, predict, new_data = testing(train_test)))
overall_pred <- btstrp$pred %>% reduce(`+`) %>% `/`(nrow(btstrp))
btstrp$pred[[1]] # prediction from one fit
#> # A tibble: 8 x 1
#> .pred
#> <dbl>
#> 1 25.0
#> 2 20.8
#> 3 16.0
#> 4 21.0
#> 5 12.8
#> 6 28.2
#> 7 16.0
#> 8 26.6
overall_pred # final (averaged) prediction made by the model
#> .pred
#> 1 21.02806
#> 2 21.79681
#> 3 15.44021
#> 4 22.91096
#> 5 12.17279
#> 6 27.78687
#> 7 14.71942
#> 8 26.54504
# The parsnip model type would then be a wrapper around the above ---------
# For example, the interface could look like this
proposed_model <-
ensemble("regression", model = "linear_reg", resamples = 5, mtry = 4) %>%
set_engine("lm")
fitted <- fit(proposed_model, mpg ~ ., data = training(train_test)) # returns a model object that essentially contains btstrp$fit
pred <- predict(fitted, new_data = testing(train_test)) # returns overall_pred
# If the base model has hyperparameters, these should also be exposed for
# tuning for example via the ``tune`` package
Metadata
Metadata
Assignees
Labels
No labels