Add engine specification field for predictor encodings #319

juliasilge · 2020-05-26T19:16:49Z

Closes #290.

This PR adds the field for engine-specific encodings (only dummy/indicator variables for now).

For example, the encoding options for ranger are:

set_encoding(
  model = "rand_forest",
  eng = "ranger",
  mode = "regression",
  options = list(predictor_indicators = FALSE)
)

While the encoding options for vanilla logistic regression are:

set_encoding(
  model = "logistic_reg",
  eng = "glm",
  mode = "classification",
  options = list(predictor_indicators = TRUE)
)

These changes depend on handling the names for data arguments implemented in #315 and #316.

These encodings can be used in workflows so that the user experiences the same behavior around dummy variable creation in both parsnip and workflows.

I am pretty confident that predictor_indicators = TRUE / FALSE is correct for all the model + engine combinations except for liquidSVM. I was having trouble getting output from those models and could use some double-checking.

…cator variables

Merge branch 'master' into encoding-options # Conflicts: # R/linear_reg_data.R # R/svm_poly_data.R # R/svm_rbf_data.R # tests/testthat/test_svm_poly.R # tests/testthat/test_svm_rbf.R

…decision tree

juliasilge · 2020-05-26T19:19:50Z

This PR also fixes the function used with Spark decision trees 🌳 for regression.

R/mars_data.R

github-actions · 2021-03-07T00:28:47Z

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

topepo and others added 26 commits April 29, 2020 21:29

initial work on #290

c1dbac0

merge origin/master

5494a32

Fix more tests for tidyr and tibble

eeac337

Fine-tune documentation

16a0c85

Engine encoding for logistic_reg()

99137ac

Do not want inner names

27246a6

Indicator variables for MARS

972a796

Look up predictor indicator; use in convert_form_to_xy_fit()

cd4ed77

Test indicators = FALSE compared to a model that does not create indi…

64bbde0

…cator variables

Set predictor indicators for xgboost (TRUE) and C5.0 (FALSE)

e02573f

Set predictor encodings for Spark (TRUE).

ad6ac8c

Add glue. Closes #296.

2c52d2c

For null model, set predictor indicators to... FALSE? 🤔

c6d0d35

Also need engine to find the indicator encoding

d3ee6de

Decision tree predictors = FALSE

47544b4

Neural nets all TRUE for indicators

ccee213

Predictor indicators for kknn

3780454

Predictor indicators for multinomial classification

1779777

Random forest predictor indicators

7d87e45

Survival models make indicators

172541c

Change kernlab to use formula interface, add indicator encoding

115292d

Change svm_rbf (kernlab) to formula interface, add indicator encodings

2e8d113

Update tests for kernlab formula interface

a420749

Merge IT ALL

3a3c134

Merge branch 'master' into encoding-options # Conflicts: # R/linear_reg_data.R # R/svm_poly_data.R # R/svm_rbf_data.R # tests/testthat/test_svm_poly.R # tests/testthat/test_svm_rbf.R

Spark *always* makes indicator variables, fix dependency for Spark + …

3c8481e

…decision tree

Fix function used with Spark decision tree for regression

1160e1e

topepo reviewed May 26, 2020

View reviewed changes

R/mars_data.R Outdated Show resolved Hide resolved

Change to predictor_indicators = FALSE for MARS models

534987e

topepo merged commit aa29bac into master May 29, 2020

This was referenced Jun 2, 2020

Add engine specific predictor encodings tidymodels/workflows#51

Merged

Unexpectedly different behavior for factors/dummy variables between parsnip and workflows #326

Closed

github-actions bot locked and limited conversation to collaborators Mar 7, 2021

juliasilge deleted the encoding-options branch June 27, 2021 16:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add engine specification field for predictor encodings #319

Add engine specification field for predictor encodings #319

Uh oh!

juliasilge commented May 26, 2020

Uh oh!

juliasilge commented May 26, 2020

Uh oh!

Uh oh!

github-actions bot commented Mar 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add engine specification field for predictor encodings #319

Add engine specification field for predictor encodings #319

Uh oh!

Conversation

juliasilge commented May 26, 2020

Uh oh!

juliasilge commented May 26, 2020

Uh oh!

Uh oh!

github-actions bot commented Mar 7, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants