-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added predleaf to xgbTree for predict #318
Conversation
What was the thought behind adding For the random grid search, you can get rid of the Also, I did some testing and I don't get the same answers as the code that doesn't use the sub-model trick. Here is my testing code: library(caret)
modelInfo2 <- modelInfo
modelInfo2$loop <- NULL
###################################################################
small <- expand.grid(max_depth = c(1, 10),
nrounds = c(10, 100, 500),
eta = .3,
gamma = 0,
colsample_bytree = .6,
min_child_weight = 1)
###################################################################
set.seed(46)
dat <- twoClassSim(200)
seeds <- vector(mode = "list", length = 26)
seeds <- lapply(seeds, function(x) 1:40)
set.seed(1)
mod1 <- train(Class ~ ., data = dat,
method = modelInfo,
tuneGrid = small,
trControl = trainControl(seeds = seeds,
savePredictions = TRUE,
classProbs = TRUE))
set.seed(1)
mod2 <- train(Class ~ ., data = dat,
method = modelInfo2,
tuneGrid = small,
trControl = trainControl(seeds = seeds,
savePredictions = TRUE,
classProbs = TRUE))
all.equal(mod1$results$Accuracy, mod2$results$Accuracy)
summary(mod1$results$Accuracy-mod2$results$Accuracy)
mod2$times$everything[3]/mod1$times$everything[3] I get: > all.equal(mod1$results$Accuracy, mod2$results$Accuracy)
[1] "Mean relative difference: 0.00694488"
> summary(mod1$results$Accuracy-mod2$results$Accuracy)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.005201 -0.001801 0.000000 0.001264 0.003199 0.010920 They aren't large differences but they shouldn't be there. It looks like the results are the same for some tuning parameters, specifically when the number of rounds are small. If I re-run I did my checking wrong I can get the same answers from > library(xgboost)
>
> data(agaricus.train, package='xgboost')
> data(agaricus.test, package='xgboost')
> train <- agaricus.train
> test <- agaricus.test
>
> set.seed(1)
> bst1 <- xgboost(data = train$data, label = train$label, max.depth = 2,
+ eta = 1, nthread = 1, nround = 100, objective = "binary:logistic")
[0] train-error:0.046522
[1] train-error:0.022263
[2] train-error:0.007063
[3] train-error:0.015200
[4] train-error:0.007063
[5] train-error:0.001228
[6] train-error:0.001228
[7] train-error:0.001228
[8] train-error:0.001228
[9] train-error:0.000000
[10] train-error:0.000000
[11] train-error:0.000000
[12] train-error:0.000000
[13] train-error:0.000000
[14] train-error:0.000000
[15] train-error:0.000000
[16] train-error:0.000000
[17] train-error:0.000000
[18] train-error:0.000000
[19] train-error:0.000000
[20] train-error:0.000000
[21] train-error:0.000000
[22] train-error:0.000000
[23] train-error:0.000000
[24] train-error:0.000000
[25] train-error:0.000000
[26] train-error:0.000000
[27] train-error:0.000000
[28] train-error:0.000000
[29] train-error:0.000000
[30] train-error:0.000000
[31] train-error:0.000000
[32] train-error:0.000000
[33] train-error:0.000000
[34] train-error:0.000000
[35] train-error:0.000000
[36] train-error:0.000000
[37] train-error:0.000000
[38] train-error:0.000000
[39] train-error:0.000000
[40] train-error:0.000000
[41] train-error:0.000000
[42] train-error:0.000000
[43] train-error:0.000000
[44] train-error:0.000000
[45] train-error:0.000000
[46] train-error:0.000000
[47] train-error:0.000000
[48] train-error:0.000000
[49] train-error:0.000000
[50] train-error:0.000000
[51] train-error:0.000000
[52] train-error:0.000000
[53] train-error:0.000000
[54] train-error:0.000000
[55] train-error:0.000000
[56] train-error:0.000000
[57] train-error:0.000000
[58] train-error:0.000000
[59] train-error:0.000000
[60] train-error:0.000000
[61] train-error:0.000000
[62] train-error:0.000000
[63] train-error:0.000000
[64] train-error:0.000000
[65] train-error:0.000000
[66] train-error:0.000000
[67] train-error:0.000000
[68] train-error:0.000000
[69] train-error:0.000000
[70] train-error:0.000000
[71] train-error:0.000000
[72] train-error:0.000000
[73] train-error:0.000000
[74] train-error:0.000000
[75] train-error:0.000000
[76] train-error:0.000000
[77] train-error:0.000000
[78] train-error:0.000000
[79] train-error:0.000000
[80] train-error:0.000000
[81] train-error:0.000000
[82] train-error:0.000000
[83] train-error:0.000000
[84] train-error:0.000000
[85] train-error:0.000000
[86] train-error:0.000000
[87] train-error:0.000000
[88] train-error:0.000000
[89] train-error:0.000000
[90] train-error:0.000000
[91] train-error:0.000000
[92] train-error:0.000000
[93] train-error:0.000000
[94] train-error:0.000000
[95] train-error:0.000000
[96] train-error:0.000000
[97] train-error:0.000000
[98] train-error:0.000000
[99] train-error:0.000000
> pred_1_50 <- predict(bst1, test$data, ntreelimit = 50)
>
> set.seed(1)
> bst2 <- xgboost(data = train$data, label = train$label, max.depth = 2,
+ eta = 1, nthread = 1, nround = 50, objective = "binary:logistic")
[0] train-error:0.046522
[1] train-error:0.022263
[2] train-error:0.007063
[3] train-error:0.015200
[4] train-error:0.007063
<snip>
> pred_2_50 <- predict(bst2, test$data)
> pred_2_50_ntl <- predict(bst2, test$data, ntreelimit = 50)
>
> all.equal(pred_2_50, pred_1_50)
[1] TRUE
> all.equal(pred_2_50, pred_2_50_ntl)
[1] TRUE
>
> sessionInfo()
R version 3.2.2 Patched (2015-10-19 r69547)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.9.5 (Mavericks)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] xgboost_0.4-2
loaded via a namespace (and not attached):
[1] magrittr_1.5 Matrix_1.2-2 tools_3.2.2 grid_3.2.2 data.table_1.9.6
[6] stringr_0.6.2 chron_2.3-45 lattice_0.20-33 I'll look more at this tomorrow. |
That's weird but fortunately it's not huge difference. I am not sure how the |
Could you re-run the Travis build? I did what you suggested but have no idea why it failed. Also, this might be related to your question? dmlc/xgboost#310 |
Any progress on this yet? @topepo |
Is this the correct way to add it?