Skip to content

split_to_rset() uses randomness when it doesn't need to #264

@DavisVaughan

Description

@DavisVaughan

Extracted from #11 (comment), because I now believe this is a separate standalone issue.

split_to_rset() calls rsample::mc_cv() just to get the structure of that rset subclass, but then we override the results with x. HOWEVER, it relies on randomness to generate the object that we overwrite. This causes some confusion with reproducibility between last_fit() and direct usage of fit()/predict().

> tune:::split_to_rset
function (x) 
{
    prop <- length(x$in_id)/nrow(x$data)
    res <- rsample::mc_cv(x$data, times = 1, prop = prop)
    res$splits[[1]] <- x
    res
}

Since we really just need to generate the structure the rset object, we should consider using the below rewrite instead. It is still suboptimal because it hardcodes the structure of an mc_cv rset, but I can live with that.

# Manually construct an `mc_cv()` rset.
# Don't call `mc_cv()` directly, as that will mess with the random seed
split_to_rset <- function(x) {
  times <- 1L
  prop <- length(x$in_id) / nrow(x$data)
  strata <- FALSE
  
  attrib <- list(prop = prop, times = times, strata = strata)
  
  splits <- list(x)
  
  ids <- "Resample1"
  
  rsample::new_rset(
    splits = list(x), 
    ids = ids,
    attrib = attrib,
    subclass = c("mc_cv", "rset")
  )
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugan unexpected problem or unintended behaviorfeaturea feature request or enhancement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions