split_to_rset() uses randomness when it doesn't need to #264

DavisVaughan · 2020-08-18T19:02:28Z

Extracted from #11 (comment), because I now believe this is a separate standalone issue.

split_to_rset() calls rsample::mc_cv() just to get the structure of that rset subclass, but then we override the results with x. HOWEVER, it relies on randomness to generate the object that we overwrite. This causes some confusion with reproducibility between last_fit() and direct usage of fit()/predict().

> tune:::split_to_rset
function (x) 
{
    prop <- length(x$in_id)/nrow(x$data)
    res <- rsample::mc_cv(x$data, times = 1, prop = prop)
    res$splits[[1]] <- x
    res
}

Since we really just need to generate the structure the rset object, we should consider using the below rewrite instead. It is still suboptimal because it hardcodes the structure of an mc_cv rset, but I can live with that.

# Manually construct an `mc_cv()` rset.
# Don't call `mc_cv()` directly, as that will mess with the random seed
split_to_rset <- function(x) {
  times <- 1L
  prop <- length(x$in_id) / nrow(x$data)
  strata <- FALSE
  
  attrib <- list(prop = prop, times = times, strata = strata)
  
  splits <- list(x)
  
  ids <- "Resample1"
  
  rsample::new_rset(
    splits = list(x), 
    ids = ids,
    attrib = attrib,
    subclass = c("mc_cv", "rset")
  )
}

The text was updated successfully, but these errors were encountered:

github-actions · 2021-03-06T00:10:50Z

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

juliasilge added bug an unexpected problem or unintended behavior feature a feature request or enhancement labels Aug 18, 2020

DavisVaughan mentioned this issue Sep 3, 2020

Construct a "manual" rset for usage in last_fit() #273

Merged

DavisVaughan closed this as completed in #273 Sep 14, 2020

github-actions bot locked and limited conversation to collaborators Mar 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

split_to_rset() uses randomness when it doesn't need to #264

split_to_rset() uses randomness when it doesn't need to #264

DavisVaughan commented Aug 18, 2020

github-actions bot commented Mar 6, 2021

split_to_rset() uses randomness when it doesn't need to #264

split_to_rset() uses randomness when it doesn't need to #264

Comments

DavisVaughan commented Aug 18, 2020

github-actions bot commented Mar 6, 2021