-
Notifications
You must be signed in to change notification settings - Fork 635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xgbDART #742
xgbDART #742
Conversation
Codecov Report
@@ Coverage Diff @@
## master #742 +/- ##
=======================================
Coverage 16.97% 16.97%
=======================================
Files 90 90
Lines 13187 13187
=======================================
Hits 2238 2238
Misses 10949 10949 Continue to review full report at Codecov.
|
models/files/xgbDART.R
Outdated
|
||
if( !is.null(modelFit$param$objective) && modelFit$param$objective == 'binary:logitraw'){ | ||
p <- predict(modelFit, newdata) | ||
out <- exp(p)/(1+exp(p)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using out <- binomial()$linkinv(p)
would be better since it takes into account potential numerical issues
models/files/xgbDART.R
Outdated
"Minimum Loss Reduction", | ||
"Subsample Percentage", | ||
"Subsample Ratio of Columns", | ||
"Fraction of previous trees to drop during dropout", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you shorten these and use consistent capitalization (e.g. maybe "Fraction of Previous Trees")? The labels might get used in ggplot
legends or facets and long labels might be an issue.
pkg/caret/DESCRIPTION
Outdated
@@ -1,5 +1,5 @@ | |||
Package: caret | |||
Version: 6.0-77 | |||
Version: 6.0-78 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just revved the file to bump the version up so this isn't needed.
It looks good. I had a few minor notes that you should see. |
No problem, all very reasonable; implemented. |
Thanks! |
'xgboost' offers a third booster type option - DART. It allows controlling under/over-fitting by drop-outs; trees added to correct trivial errors may be prevented. Relevant reference by Rashmi & Gilad-Bachrach here. All test in RegressionTests/Code work fine. (The standard
warning
when passingxgb.DMatrix
as inputs remain) Due to it's design (it has to traverse all the previous trees before making the "next fit") it is slower thanxgbTree
.Comment: I have found it to be good in terms of
varImp
insights. Some artificial noise-variables that trickedxgbTree
were weeded-out byxgbDART
in some toy examples I tried.