enable set.seed for any recipe steps that are not deterministic #188

sheffe · 2018-08-30T18:46:10Z

This is a cross-post from a caret issue that affects recipes.

Right now, caret preps a recipe twice: once to generate the dataset that finalModel is trained on, and, in a second step outside the finalModel training, it generates a second prepped recipe that is returned with the rest of the train output. The latter is what caret uses for transforming new data during later predictions out.

This means that any recipes steps that are not deterministic - I think the main cases that would affect users now are step_upsample and step_downsample - will lead to a caret finalModel that is not trained on the same dataset as the preprocessing recipe included for later. This is likely to be an invisible bug, except in cases where the difference between randomly-generated up/down-samples during the different recipe preparations becomes large enough to cause some downstream changes in (eg) which features are retained.

The text was updated successfully, but these errors were encountered:

topepo · 2018-09-10T16:08:25Z

The way to solve this would be to add seed arguments (and to default them to something like sample.int(10^5, 1)) and execute the code using withr::with_seed.

github-actions · 2021-02-24T00:02:53Z

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.

topepo closed this as completed in 3683dc6 Sep 28, 2018

github-actions bot locked and limited conversation to collaborators Feb 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable set.seed for any recipe steps that are not deterministic #188

enable set.seed for any recipe steps that are not deterministic #188

sheffe commented Aug 30, 2018

topepo commented Sep 10, 2018

github-actions bot commented Feb 24, 2021

enable set.seed for any recipe steps that are not deterministic #188

enable set.seed for any recipe steps that are not deterministic #188

Comments

sheffe commented Aug 30, 2018

topepo commented Sep 10, 2018

github-actions bot commented Feb 24, 2021