more explicit seed setting for tune_grid #11

topepo · 2019-08-16T14:25:59Z

Perhaps take the caret::train() route and have a vector of seeds as a control argument and add a lot of with_seed() to the different modules.

The text was updated successfully, but these errors were encountered:

juliasilge · 2020-08-18T18:42:12Z

Folks are running into unexpected results with seed setting for last_fit(); this looks bad enough to me to escalate to a bug.

library(tidymodels)

set.seed(123)
tr_te_split <- initial_split(mtcars)

rf_spec <- rand_forest() %>%
  set_mode("regression") %>%
  set_engine("ranger")

rf_wf <- workflow() %>%
  add_model(rf_spec) %>%
  add_formula(mpg ~ .)

set.seed(345)
last_rf_fit <- last_fit(rf_wf, split = tr_te_split)
collect_predictions(last_rf_fit)
#> # A tibble: 8 x 4
#>   id               .pred  .row   mpg
#>   <chr>            <dbl> <int> <dbl>
#> 1 train/test split  24.8     3  22.8
#> 2 train/test split  18.8    10  19.2
#> 3 train/test split  16.8    14  15.2
#> 4 train/test split  13.8    15  10.4
#> 5 train/test split  28.0    18  32.4
#> 6 train/test split  29.0    19  30.4
#> 7 train/test split  16.9    22  15.5
#> 8 train/test split  15.4    31  15


set.seed(345)
last_rf_fit_2 <- fit(rf_wf, training(tr_te_split))
predict(last_rf_fit_2, testing(tr_te_split))
#> # A tibble: 8 x 1
#>   .pred
#>   <dbl>
#> 1  24.3
#> 2  18.6
#> 3  16.8
#> 4  13.5
#> 5  28.0
#> 6  29.1
#> 7  17.0
#> 8  15.4

^{Created on 2020-08-18 by the reprex package (v0.3.0.9001)}

hnagaty · 2020-08-19T06:38:46Z

It's worth mentioning that I tried to set the seed parameter in ranger, set_engine("ranger", seed = 123). When I did, the predictions were identical.

library(tidymodels)
#> ── Attaching packages ────────────────────────────────────────────────────────────────────────────────────── tidymodels 0.1.1 ──
#> ✓ broom     0.7.0      ✓ recipes   0.1.13
#> ✓ dials     0.0.8      ✓ rsample   0.0.7 
#> ✓ dplyr     1.0.0      ✓ tibble    3.0.3 
#> ✓ ggplot2   3.3.2      ✓ tidyr     1.1.0 
#> ✓ infer     0.5.3      ✓ tune      0.1.1 
#> ✓ modeldata 0.0.2      ✓ workflows 0.1.2 
#> ✓ parsnip   0.1.2      ✓ yardstick 0.0.7 
#> ✓ purrr     0.3.4
#> ── Conflicts ───────────────────────────────────────────────────────────────────────────────────────── tidymodels_conflicts() ──
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter()  masks stats::filter()
#> x dplyr::lag()     masks stats::lag()
#> x recipes::step()  masks stats::step()

set.seed(123)
tr_te_split <- initial_split(mtcars)

rf_spec <- rand_forest() %>%
  set_mode("regression") %>%
  set_engine("ranger", seed = 123)

rf_wf <- workflow() %>%
  add_model(rf_spec) %>%
  add_formula(mpg ~ .)

set.seed(345)
last_rf_fit <- last_fit(rf_wf, split = tr_te_split)
collect_predictions(last_rf_fit)
#> # A tibble: 8 x 4
#>   id               .pred  .row   mpg
#>   <chr>            <dbl> <int> <dbl>
#> 1 train/test split  24.8     3  22.8
#> 2 train/test split  18.8    10  19.2
#> 3 train/test split  16.5    14  15.2
#> 4 train/test split  13.6    15  10.4
#> 5 train/test split  28.2    18  32.4
#> 6 train/test split  29.2    19  30.4
#> 7 train/test split  17.3    22  15.5
#> 8 train/test split  15.3    31  15


set.seed(345)
last_rf_fit_2 <- fit(rf_wf, training(tr_te_split))
predict(last_rf_fit_2, testing(tr_te_split))
#> # A tibble: 8 x 1
#>   .pred
#>   <dbl>
#> 1  24.8
#> 2  18.8
#> 3  16.5
#> 4  13.6
#> 5  28.2
#> 6  29.2
#> 7  17.3
#> 8  15.3

^{Created on 2020-08-19 by the reprex package (v0.3.0)}

juliasilge · 2020-10-07T20:08:34Z

Closed in #275 🎉

github-actions · 2021-03-06T00:10:37Z

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

topepo added the feature a feature request or enhancement label Apr 3, 2020

juliasilge added the bug an unexpected problem or unintended behavior label Aug 18, 2020

DavisVaughan mentioned this issue Aug 18, 2020

split_to_rset() uses randomness when it doesn't need to #264

Closed

juliasilge removed the bug an unexpected problem or unintended behavior label Aug 18, 2020

topepo added a commit that referenced this issue Sep 13, 2020

Changes for #11

e1167b6

juliasilge closed this as completed Oct 7, 2020

github-actions bot locked and limited conversation to collaborators Mar 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

more explicit seed setting for tune_grid #11

more explicit seed setting for tune_grid #11

topepo commented Aug 16, 2019

juliasilge commented Aug 18, 2020

hnagaty commented Aug 19, 2020 •

edited

juliasilge commented Oct 7, 2020

github-actions bot commented Mar 6, 2021

more explicit seed setting for tune_grid #11

more explicit seed setting for tune_grid #11

Comments

topepo commented Aug 16, 2019

juliasilge commented Aug 18, 2020

hnagaty commented Aug 19, 2020 • edited

juliasilge commented Oct 7, 2020

github-actions bot commented Mar 6, 2021

hnagaty commented Aug 19, 2020 •

edited