xgboost mtry parameter swap for #495 #499

topepo · 2021-05-19T14:47:51Z

closes #495
closes #461

colsample_bytree remains an argument to xgb_train(). The mtry parameter in boost_tree() parameter now points to colsample_bynode.

We might want to add more engine-specific tunables to tune for this engine.

@mdancho84 I think that modeltime would need the same switch.

(edit for clarity)

juliasilge · 2021-05-19T19:16:30Z

So now both of these parameters are going to have the problem outlined in #461

juliasilge · 2021-05-19T19:52:27Z

I reran some analyses with xgboost, including this one, and the results are different in about the ways I think we'd expect:

No big changes in overall performance after tuning, but it switched some things in the variable importance and such. People who come back to train with the same data after this change are going to notice.

mdancho84 · 2021-05-25T14:30:56Z

I'm here now - just seeing this. Yep, Modeltime will need the switch (I believe).

mdancho84 · 2021-05-25T16:12:42Z

Hey, I've reviewed and I see one potential issue with backwards compatibility. What happens is that models that may have been specified with a value of 1 thinking this means 100%, gets converted to one column, which makes model performance very bad.

I recommend handling 1 as 100% of columns, and not 1 as 1 column.

The case in which a user actually only wants to use 1 columns should be rare, and handling as 100% is consistent with the underlying xgboost::xgb.train() function.

topepo · 2021-05-25T16:50:55Z

The previous behavior of "1.00 mean s 100% but otherwise it is a count" was a big mistake on my part.

It is 100% is consistent with xgboost, but completely inconsistent with randomForest, ranger, gbm, C50 and so on. We want to avoid people having to think "but for xgboost I have to do this instead of that". There are probably 4-5 things in the overall xgboost api that are irregular (and we fix them).

mdancho84 · 2021-05-25T17:34:42Z

We are good to go. Let me know when parsnip is accepted by CRAN, and I will update modeltime.

github-actions · 2021-06-09T00:40:14Z

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

xgboost mtry parameter swap for #495

adc9d77

topepo requested a review from juliasilge May 19, 2021 14:50

juliasilge approved these changes May 19, 2021

View reviewed changes

changes for #461

374dbf9

topepo merged commit 46a2018 into master May 21, 2021

topepo deleted the xgb-mtry branch May 21, 2021 13:53

topepo mentioned this pull request May 25, 2021

Issues with new parsnip business-science/modeltime.ensemble#10

Closed

mdancho84 mentioned this pull request May 25, 2021

parsnip xgboost: mtry parameter swap business-science/modeltime#107

Closed

dfalbel mentioned this pull request Jun 2, 2021

mtry should map to feature_fraction_bynode for lightgbm curso-r/treesnip#41

Open

github-actions bot locked and limited conversation to collaborators Jun 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xgboost mtry parameter swap for #495 #499

xgboost mtry parameter swap for #495 #499

topepo commented May 19, 2021 •

edited

Loading

juliasilge commented May 19, 2021

juliasilge commented May 19, 2021

mdancho84 commented May 25, 2021

mdancho84 commented May 25, 2021 •

edited

Loading

topepo commented May 25, 2021

mdancho84 commented May 25, 2021

github-actions bot commented Jun 9, 2021

xgboost mtry parameter swap for #495 #499

xgboost mtry parameter swap for #495 #499

Conversation

topepo commented May 19, 2021 • edited Loading

juliasilge commented May 19, 2021

juliasilge commented May 19, 2021

mdancho84 commented May 25, 2021

mdancho84 commented May 25, 2021 • edited Loading

topepo commented May 25, 2021

mdancho84 commented May 25, 2021

github-actions bot commented Jun 9, 2021

topepo commented May 19, 2021 •

edited

Loading

mdancho84 commented May 25, 2021 •

edited

Loading