Creating an ensemble of parsnip models that is itself a (tunable) parsnip model


## The problem

I recently tried to implement an ensemble of parsnip model. This is easily done for one particular model (see example for bagging of linear regression below) but things become more difficult as soon as I would like to implement a general model type that can take any parsnip model as the base model. This model should ideally:

- have its own list of arguments/hyperparameters such as number of resamples or number of features tried at each iteration
- expose the hyperparameters of its base model for hyperparameter tuning

Another example besides bagging could be to allow for model stacking of parsnip models.

Is there interest in adding such models or having an ensemble module in parsnip/tidymodels? 
Does it possibly exist already or is really easy to implement, but I just missed it?
If yes to the former and no to the latter, do you have any suggestions as to how to best go about this?

## Reproducible example
``` r

library(dplyr)
library(purrr)
library(rsample)
library(parsnip)

# Implementation of a simple lin-reg bagging routine  ---------------------

# Divide data into train and test and re-sample the training set for bagging
set.seed(42)
train_test <- initial_split(mtcars)
btstrp <- bootstraps(training(train_test), times = 5)

# Randomly choose a subset of variables
btstrp <- btstrp %>% 
  mutate(cols = rerun(n(), sample(2:ncol(mtcars), size = 4, replace = FALSE)))

# Fit a separate linear regression to each sample
lr <- linear_reg() %>% set_engine("lm")
btstrp <- btstrp %>% 
  mutate(fit = map2(splits, cols, ~ fit(lr, mpg ~ ., analysis(.x)[, c(1, .y)])))

btstrp           # Overall collection of models
#> # Bootstrap sampling 
#> # A tibble: 5 x 4
#>   splits          id         cols      fit     
#> * <list>          <chr>      <list>    <list>  
#> 1 <split [24/8]>  Bootstrap1 <int [4]> <fit[+]>
#> 2 <split [24/10]> Bootstrap2 <int [4]> <fit[+]>
#> 3 <split [24/4]>  Bootstrap3 <int [4]> <fit[+]>
#> 4 <split [24/8]>  Bootstrap4 <int [4]> <fit[+]>
#> 5 <split [24/8]>  Bootstrap5 <int [4]> <fit[+]>
btstrp$fit[[1]]  # Fit on one bootstrap sample
#> parsnip model object
#> 
#> Fit time:  0ms 
#> 
#> Call:
#> stats::lm(formula = formula, data = data)
#> 
#> Coefficients:
#> (Intercept)         qsec           am           vs           hp  
#>    38.51572     -0.77183      5.07365      3.20619     -0.05355

# Then make predictions by averaging thepredictions of each submodel
btstrp <- btstrp %>% 
  mutate(pred = map(fit, predict, new_data = testing(train_test)))

overall_pred <- btstrp$pred %>% reduce(`+`) %>% `/`(nrow(btstrp))

btstrp$pred[[1]] # prediction from one fit
#> # A tibble: 8 x 1
#>   .pred
#>   <dbl>
#> 1  25.0
#> 2  20.8
#> 3  16.0
#> 4  21.0
#> 5  12.8
#> 6  28.2
#> 7  16.0
#> 8  26.6
overall_pred     # final (averaged) prediction made by the model
#>      .pred
#> 1 21.02806
#> 2 21.79681
#> 3 15.44021
#> 4 22.91096
#> 5 12.17279
#> 6 27.78687
#> 7 14.71942
#> 8 26.54504


# The parsnip model type would then be a wrapper around the above ---------
# For example, the interface could look like this

proposed_model <- 
  ensemble("regression", model = "linear_reg", resamples = 5, mtry = 4) %>% 
  set_engine("lm")

fitted <- fit(proposed_model, mpg ~ ., data = training(train_test)) # returns a model object that essentially contains btstrp$fit
pred   <- predict(fitted, new_data = testing(train_test))           # returns overall_pred

# If the base model has hyperparameters, these should also be exposed for
# tuning for example via the ``tune`` package
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Creating an ensemble of parsnip models that is itself a (tunable) parsnip model #269

The problem

Reproducible example

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Creating an ensemble of parsnip models that is itself a (tunable) parsnip model #269

Description

The problem

Reproducible example

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions