Thanks for adding group_bootstraps()! One quick problem from my testing: Currently, group_bootstraps() throws an error when it resamples the original dataset. This happens easily when the number of resamples is larger than the number of group combinations:
library(rsample)
library(tidyverse)
set.seed(2)
dat1 <- tibble::tibble(a = 1:20, b = letters[1:20], c = rep(1:4, 5))
d <- dat1 |>
group_bootstraps(c)
#> Error in `group_bootstraps()`:
#> ! Some assessment sets contained zero rows
#> ℹ Consider using a non-grouped resampling method
Here 25 resamples of 4 groups hits the problem.
But I don't think an error/deadend is the correct design here. There are bootstrap workflows where it is fine to resample the original data. For example, suppose you want to just a fit a bunch of models on resampled groups:
- bootstrap like 2000 datasets
- fit a model on each one
- estimate something on each model
- report the mean and 95% interval of the model estimates
The assessment split data is never used in this workflow. Require non-empty assessment splits also means we are losing the ability to average over resamples that just happen to be the apparent dataset, so it's kind of a bias thing too.
Two possible suggestions:
- allow user to permit empty assessment splits with an argument for like
group_bootstraps(..., allow_empty_assessment = TRUE).
- lower the condition from an error to a warning.
The second option is more dangerous because it means users will hit preventable errors when they try to use assessment(), so I like the first one better.
Created on 2022-08-08 by the reprex package (v2.0.1)
Thanks for adding
group_bootstraps()! One quick problem from my testing: Currently,group_bootstraps()throws an error when it resamples the original dataset. This happens easily when the number of resamples is larger than the number of group combinations:Here 25 resamples of 4 groups hits the problem.
But I don't think an error/deadend is the correct design here. There are bootstrap workflows where it is fine to resample the original data. For example, suppose you want to just a fit a bunch of models on resampled groups:
The assessment split data is never used in this workflow. Require non-empty assessment splits also means we are losing the ability to average over resamples that just happen to be the apparent dataset, so it's kind of a bias thing too.
Two possible suggestions:
group_bootstraps(..., allow_empty_assessment = TRUE).The second option is more dangerous because it means users will hit preventable errors when they try to use assessment(), so I like the first one better.
Created on 2022-08-08 by the reprex package (v2.0.1)