Skip to content

group_bootstraps() errors when it gets original dataset in a resample #356

@tjmahr

Description

@tjmahr

Thanks for adding group_bootstraps()! One quick problem from my testing: Currently, group_bootstraps() throws an error when it resamples the original dataset. This happens easily when the number of resamples is larger than the number of group combinations:

library(rsample)
library(tidyverse)

set.seed(2)
dat1 <- tibble::tibble(a = 1:20, b = letters[1:20], c = rep(1:4, 5))

d <- dat1 |> 
  group_bootstraps(c) 
#> Error in `group_bootstraps()`:
#> ! Some assessment sets contained zero rows
#> ℹ Consider using a non-grouped resampling method

Here 25 resamples of 4 groups hits the problem.

But I don't think an error/deadend is the correct design here. There are bootstrap workflows where it is fine to resample the original data. For example, suppose you want to just a fit a bunch of models on resampled groups:

  • bootstrap like 2000 datasets
  • fit a model on each one
  • estimate something on each model
  • report the mean and 95% interval of the model estimates

The assessment split data is never used in this workflow. Require non-empty assessment splits also means we are losing the ability to average over resamples that just happen to be the apparent dataset, so it's kind of a bias thing too.

Two possible suggestions:

  • allow user to permit empty assessment splits with an argument for like group_bootstraps(..., allow_empty_assessment = TRUE).
  • lower the condition from an error to a warning.

The second option is more dangerous because it means users will hit preventable errors when they try to use assessment(), so I like the first one better.

Created on 2022-08-08 by the reprex package (v2.0.1)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions