Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tuning trees for C5_rules doesn't change performance #49

Closed
DesmondChoy opened this issue Mar 25, 2022 · 6 comments · Fixed by #58
Closed

Tuning trees for C5_rules doesn't change performance #49

DesmondChoy opened this issue Mar 25, 2022 · 6 comments · Fixed by #58
Labels
bug an unexpected problem or unintended behavior

Comments

@DesmondChoy
Copy link

The problem

Tuning trees() when using parsnip::C5_rules() doesn't change performance (please refer to ggplot below).
I'm not sure if this is a bug or intended? And if it's indeed the latter, should trees() be removed as a hyperparameter to avoid confusion?

Reproducible example

library(tidyverse)
library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip
#> Warning: package 'parsnip' was built under R version 4.1.3
library(rules)
#> 
#> Attaching package: 'rules'
#> The following object is masked from 'package:dials':
#> 
#>     max_rules
library(palmerpenguins)
#> Warning: package 'palmerpenguins' was built under R version 4.1.3

set.seed(2022)
penguins_split <- initial_split(penguins)
penguins_training <- training(penguins_split)
penguins_testing <- testing(penguins_split)

folds <- vfold_cv(penguins_training, v = 5)

simple_rec <- penguins_training %>% 
  recipe(species ~ .)

C5_model <- C5_rules(trees = tune()) %>% 
  set_engine("C5.0")

penguins_wf <- workflow() %>% 
  add_recipe(simple_rec) %>% 
  add_model(C5_model)

penguins_tuning <- tune_grid(
  object = penguins_wf,
  resamples = folds,
  grid = 30,
  control = control_grid(save_pred = TRUE, verbose = TRUE),
  metrics = metric_set(accuracy, roc_auc, f_meas)
)
#> Warning: package 'C50' was built under R version 4.1.3
#> i Fold1: preprocessor 1/1
#> v Fold1: preprocessor 1/1
#> i Fold1: preprocessor 1/1, model 1/1
#> v Fold1: preprocessor 1/1, model 1/1
#> i Fold1: preprocessor 1/1, model 1/1 (predictions)
#> i Fold2: preprocessor 1/1
#> v Fold2: preprocessor 1/1
#> i Fold2: preprocessor 1/1, model 1/1
#> v Fold2: preprocessor 1/1, model 1/1
#> i Fold2: preprocessor 1/1, model 1/1 (predictions)
#> i Fold3: preprocessor 1/1
#> v Fold3: preprocessor 1/1
#> i Fold3: preprocessor 1/1, model 1/1
#> v Fold3: preprocessor 1/1, model 1/1
#> i Fold3: preprocessor 1/1, model 1/1 (predictions)
#> i Fold4: preprocessor 1/1
#> v Fold4: preprocessor 1/1
#> i Fold4: preprocessor 1/1, model 1/1
#> v Fold4: preprocessor 1/1, model 1/1
#> i Fold4: preprocessor 1/1, model 1/1 (predictions)
#> i Fold5: preprocessor 1/1
#> v Fold5: preprocessor 1/1
#> i Fold5: preprocessor 1/1, model 1/1
#> v Fold5: preprocessor 1/1, model 1/1
#> i Fold5: preprocessor 1/1, model 1/1 (predictions)

collect_metrics(penguins_tuning) %>% 
  ggplot(aes(mean, trees, color = .metric)) +
  geom_point(show.legend = FALSE) +
  facet_wrap(~ .metric, scales = "free")

Created on 2022-03-25 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value
#>  version  R version 4.1.2 (2021-11-01)
#>  os       Windows 10 x64 (build 19042)
#>  system   x86_64, mingw32
#>  ui       RTerm
#>  language (EN)
#>  collate  English_Singapore.1252
#>  ctype    English_Singapore.1252
#>  tz       Asia/Kuala_Lumpur
#>  date     2022-03-25
#>  pandoc   2.14.0.3 @ C:/Program Files/RStudio/bin/pandoc/ (via rmarkdown)
#> 
#> - Packages -------------------------------------------------------------------
#>  package        * version    date (UTC) lib source
#>  assertthat       0.2.1      2019-03-21 [1] CRAN (R 4.1.2)
#>  backports        1.4.1      2021-12-13 [1] CRAN (R 4.1.2)
#>  broom          * 0.7.12     2022-01-28 [1] CRAN (R 4.1.2)
#>  C50            * 0.1.6      2022-02-05 [1] CRAN (R 4.1.3)
#>  cellranger       1.1.0      2016-07-27 [1] CRAN (R 4.1.2)
#>  class            7.3-19     2021-05-03 [2] CRAN (R 4.1.2)
#>  cli              3.1.0      2021-10-27 [1] CRAN (R 4.1.2)
#>  codetools        0.2-18     2020-11-04 [2] CRAN (R 4.1.2)
#>  colorspace       2.0-2      2021-06-24 [1] CRAN (R 4.1.2)
#>  crayon           1.4.2      2021-10-29 [1] CRAN (R 4.1.2)
#>  Cubist           0.4.0      2022-02-05 [1] CRAN (R 4.1.3)
#>  curl             4.3.2      2021-06-23 [1] CRAN (R 4.1.2)
#>  DBI              1.1.2      2021-12-20 [1] CRAN (R 4.1.2)
#>  dbplyr           2.1.1      2021-04-06 [1] CRAN (R 4.1.2)
#>  dials          * 0.1.0      2022-01-31 [1] CRAN (R 4.1.2)
#>  DiceDesign       1.9        2021-02-13 [1] CRAN (R 4.1.2)
#>  digest           0.6.29     2021-12-01 [1] CRAN (R 4.1.2)
#>  dplyr          * 1.0.8      2022-02-08 [1] CRAN (R 4.1.2)
#>  ellipsis         0.3.2      2021-04-29 [1] CRAN (R 4.1.2)
#>  evaluate         0.14       2019-05-28 [1] CRAN (R 4.1.2)
#>  fansi            0.5.0      2021-05-25 [1] CRAN (R 4.1.2)
#>  farver           2.1.0      2021-02-28 [1] CRAN (R 4.1.2)
#>  fastmap          1.1.0      2021-01-25 [1] CRAN (R 4.1.2)
#>  forcats        * 0.5.1      2021-01-27 [1] CRAN (R 4.1.2)
#>  foreach          1.5.2      2022-02-02 [1] CRAN (R 4.1.2)
#>  Formula          1.2-4      2020-10-16 [1] CRAN (R 4.1.1)
#>  fs               1.5.2      2021-12-08 [1] CRAN (R 4.1.2)
#>  furrr            0.2.3      2021-06-25 [1] CRAN (R 4.1.2)
#>  future           1.24.0     2022-02-19 [1] CRAN (R 4.1.2)
#>  future.apply     1.8.1      2021-08-10 [1] CRAN (R 4.1.2)
#>  generics         0.1.2      2022-01-31 [1] CRAN (R 4.1.2)
#>  ggplot2        * 3.3.5      2021-06-25 [1] CRAN (R 4.1.2)
#>  globals          0.14.0     2020-11-22 [1] CRAN (R 4.1.1)
#>  glue             1.6.2      2022-02-24 [1] CRAN (R 4.1.2)
#>  gower            0.2.2      2020-06-23 [1] CRAN (R 4.1.1)
#>  GPfit            1.0-8      2019-02-08 [1] CRAN (R 4.1.2)
#>  gtable           0.3.0      2019-03-25 [1] CRAN (R 4.1.2)
#>  hardhat          0.2.0      2022-01-24 [1] CRAN (R 4.1.2)
#>  haven            2.4.3      2021-08-04 [1] CRAN (R 4.1.2)
#>  highr            0.9        2021-04-16 [1] CRAN (R 4.1.2)
#>  hms              1.1.1      2021-09-26 [1] CRAN (R 4.1.2)
#>  htmltools        0.5.2      2021-08-25 [1] CRAN (R 4.1.2)
#>  httr             1.4.2      2020-07-20 [1] CRAN (R 4.1.2)
#>  infer          * 1.0.0      2021-08-13 [1] CRAN (R 4.1.2)
#>  inum             1.0-4      2021-04-12 [1] CRAN (R 4.1.2)
#>  ipred            0.9-12     2021-09-15 [1] CRAN (R 4.1.2)
#>  iterators        1.0.13     2020-10-15 [1] CRAN (R 4.1.2)
#>  jsonlite         1.7.2      2020-12-09 [1] CRAN (R 4.1.2)
#>  knitr            1.37       2021-12-16 [1] CRAN (R 4.1.2)
#>  labeling         0.4.2      2020-10-20 [1] CRAN (R 4.1.1)
#>  lattice          0.20-45    2021-09-22 [2] CRAN (R 4.1.2)
#>  lava             1.6.10     2021-09-02 [1] CRAN (R 4.1.2)
#>  lhs              1.1.3      2021-09-08 [1] CRAN (R 4.1.2)
#>  libcoin          1.0-9      2021-09-27 [1] CRAN (R 4.1.2)
#>  lifecycle        1.0.1      2021-09-24 [1] CRAN (R 4.1.2)
#>  listenv          0.8.0      2019-12-05 [1] CRAN (R 4.1.2)
#>  lubridate        1.8.0      2021-10-07 [1] CRAN (R 4.1.2)
#>  magrittr         2.0.2      2022-01-26 [1] CRAN (R 4.1.2)
#>  MASS             7.3-54     2021-05-03 [2] CRAN (R 4.1.2)
#>  Matrix           1.3-4      2021-06-01 [2] CRAN (R 4.1.2)
#>  mime             0.12       2021-09-28 [1] CRAN (R 4.1.1)
#>  modeldata      * 0.1.1      2021-07-14 [1] CRAN (R 4.1.2)
#>  modelr           0.1.8      2020-05-19 [1] CRAN (R 4.1.2)
#>  munsell          0.5.0      2018-06-12 [1] CRAN (R 4.1.2)
#>  mvtnorm          1.1-3      2021-10-08 [1] CRAN (R 4.1.1)
#>  nnet             7.3-16     2021-05-03 [2] CRAN (R 4.1.2)
#>  palmerpenguins * 0.1.0      2020-07-23 [1] CRAN (R 4.1.3)
#>  parallelly       1.30.0     2021-12-17 [1] CRAN (R 4.1.2)
#>  parsnip        * 0.2.0      2022-03-09 [1] CRAN (R 4.1.3)
#>  partykit         1.2-15     2021-08-23 [1] CRAN (R 4.1.2)
#>  pillar           1.7.0      2022-02-01 [1] CRAN (R 4.1.2)
#>  pkgconfig        2.0.3      2019-09-22 [1] CRAN (R 4.1.2)
#>  plyr             1.8.6      2020-03-03 [1] CRAN (R 4.1.2)
#>  pROC             1.18.0     2021-09-03 [1] CRAN (R 4.1.2)
#>  prodlim          2019.11.13 2019-11-17 [1] CRAN (R 4.1.2)
#>  purrr          * 0.3.4      2020-04-17 [1] CRAN (R 4.1.2)
#>  R6               2.5.1      2021-08-19 [1] CRAN (R 4.1.2)
#>  Rcpp             1.0.7      2021-07-07 [1] CRAN (R 4.1.2)
#>  readr          * 2.1.2      2022-01-30 [1] CRAN (R 4.1.2)
#>  readxl           1.3.1      2019-03-13 [1] CRAN (R 4.1.2)
#>  recipes        * 0.2.0      2022-02-18 [1] CRAN (R 4.1.2)
#>  reprex           2.0.1      2021-08-05 [1] CRAN (R 4.1.2)
#>  reshape2         1.4.4      2020-04-09 [1] CRAN (R 4.1.2)
#>  rlang          * 1.0.2      2022-03-04 [1] CRAN (R 4.1.2)
#>  rmarkdown        2.12       2022-03-02 [1] CRAN (R 4.1.2)
#>  rpart            4.1-15     2019-04-12 [2] CRAN (R 4.1.2)
#>  rsample        * 0.1.1      2021-11-08 [1] CRAN (R 4.1.2)
#>  rstudioapi       0.13       2020-11-12 [1] CRAN (R 4.1.2)
#>  rules          * 0.2.0      2022-03-14 [1] CRAN (R 4.1.2)
#>  rvest            1.0.2      2021-10-16 [1] CRAN (R 4.1.2)
#>  scales         * 1.1.1      2020-05-11 [1] CRAN (R 4.1.2)
#>  sessioninfo      1.2.2      2021-12-06 [1] CRAN (R 4.1.2)
#>  stringi          1.7.6      2021-11-29 [1] CRAN (R 4.1.2)
#>  stringr        * 1.4.0      2019-02-10 [1] CRAN (R 4.1.2)
#>  survival         3.2-13     2021-08-24 [2] CRAN (R 4.1.2)
#>  tibble         * 3.1.6      2021-11-07 [1] CRAN (R 4.1.2)
#>  tidymodels     * 0.1.4      2021-10-01 [1] CRAN (R 4.1.2)
#>  tidyr          * 1.2.0      2022-02-01 [1] CRAN (R 4.1.2)
#>  tidyselect       1.1.2      2022-02-21 [1] CRAN (R 4.1.2)
#>  tidyverse      * 1.3.1      2021-04-15 [1] CRAN (R 4.1.2)
#>  timeDate         3043.102   2018-02-21 [1] CRAN (R 4.1.1)
#>  tune           * 0.1.6      2021-07-21 [1] CRAN (R 4.1.2)
#>  tzdb             0.2.0      2021-10-27 [1] CRAN (R 4.1.2)
#>  utf8             1.2.2      2021-07-24 [1] CRAN (R 4.1.2)
#>  vctrs          * 0.3.8      2021-04-29 [1] CRAN (R 4.1.2)
#>  withr            2.4.3      2021-11-30 [1] CRAN (R 4.1.2)
#>  workflows      * 0.2.4      2021-10-12 [1] CRAN (R 4.1.2)
#>  workflowsets   * 0.1.0      2021-07-22 [1] CRAN (R 4.1.2)
#>  xfun             0.29       2021-12-14 [1] CRAN (R 4.1.2)
#>  xml2             1.3.3      2021-11-30 [1] CRAN (R 4.1.2)
#>  yaml             2.3.5      2022-02-21 [1] CRAN (R 4.1.2)
#>  yardstick      * 0.0.9      2021-11-22 [1] CRAN (R 4.1.2)
#> 
#>  [1] C:/Users/dchoy/Documents/R/win-library/4.1
#>  [2] C:/Program Files/R/R-4.1.2/library
#> 
#> ------------------------------------------------------------------------------
@juliasilge juliasilge transferred this issue from tidymodels/tune Mar 25, 2022
@juliasilge
Copy link
Member

I think something is wrong here. Here is a smaller example:

library(tidymodels)
library(rules)
#> 
#> Attaching package: 'rules'
#> The following object is masked from 'package:dials':
#> 
#>     max_rules
data(two_class_dat)


folds <- vfold_cv(two_class_dat, v = 3)

C5_rules(trees = tune()) %>% 
  set_engine("C5.0") %>%
  tune_grid(Class ~ ., resamples = folds, grid = 20) %>%
  autoplot()

Created on 2022-03-25 by the reprex package (v2.0.1)

Also look at how the other tuning parameter is not getting set?

library(tidymodels)
library(rules)
#> 
#> Attaching package: 'rules'
#> The following object is masked from 'package:dials':
#> 
#>     max_rules

C5_rules(trees = tune(), min_n = tune()) %>% 
  set_engine("C5.0") %>%
  extract_parameter_set_dials()
#> Collection of 1 parameters for tuning
#> 
#>  identifier  type    object
#>       trees trees nparam[+]

Created on 2022-03-25 by the reprex package (v2.0.1)

@juliasilge juliasilge added the bug an unexpected problem or unintended behavior label Mar 25, 2022
@DesmondChoy
Copy link
Author

Yeah min_n not being able to be tuned is linked to this other issue

@topepo
Copy link
Member

topepo commented Jun 1, 2022

I think that the model fit is not being returned properly.

library(tidymodels)
library(rules)
#> 
#> Attaching package: 'rules'
#> The following object is masked from 'package:dials':
#> 
#>     max_rules

data(two_class_dat)


folds <- vfold_cv(two_class_dat, v = 3)

res <- 
  C5_rules(trees = tune()) %>% 
  set_engine("C5.0", control = C5.0Control(earlyStopping = FALSE)) %>%
  tune_grid(Class ~ ., resamples = folds, grid = 20, control = control_grid(extract = I)) 

res$.extracts[[1]]$.extracts[[1]] %>% extract_fit_engine() 
#> C5.0 Model Specification ()

Created on 2022-05-31 by the reprex package (v2.0.1)

I'll take a look

@topepo
Copy link
Member

topepo commented Jun 1, 2022

It does do the model call correctly it is being tuned, so that's not the issue:

> C5_rules(trees = 11) %>% translate()
C5.0 Model Specification (classification)

Main Arguments:
  trees = 11

Computational engine: C5.0 

Model fit template:
rules::c5_fit(x = missing_arg(), y = missing_arg(), weights = missing_arg(), 
    trials = 11)

This is true when you look at the extracts too:

> res$.extracts[[1]]$.extracts[[1]] %>% extract_fit_engine() %>%  pluck("trials")
Requested    Actual 
       99        99 

@topepo
Copy link
Member

topepo commented Jun 1, 2022

It's these two lines. I'll work on a PR

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jun 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants