glmboost issue with verboseIter and parameters #396

pverspeelt · 2016-03-26T14:25:22Z

Start a new R session
Install the latest version of caret: update.packages(oldPkgs="caret", ask=FALSE)
Write a minimal reproducible example
run sessionInfo()

Running the code below runs 2 models with glmboost. The only difference is the order of the parameter prune. In the first model the order is "yes" , "no", in the second model the other way around. The issue is that the training log only shows 3 folds, and only for mstop = 150 and the prune parameter specified as the first one. Selecting the tuning parameters it selects mstop = 100 and the pruning parameter specified as the first one. If you look at the prints of both models it looks like the same tuning parameters were chosen and the pruning parameter was ignored.

One other point is the use of the prune parameter. It looks like it has no effect. If that is the case, shouldn't it be removed as a parameter option?

Minimal, reproducible example:

library(mlbench)
library(mboost)

data(Sonar)

set.seed(25)
trainIndex = createDataPartition(Sonar$Class, p = 0.9, list = FALSE)
training = Sonar[ trainIndex,]
testing  = Sonar[-trainIndex,]

### set training parameters
fitControl = trainControl(method = "cv",
                          number = 3,
                          ## Estimate class probabilities
                          classProbs = TRUE,
                          verboseIter = TRUE,
                          ## Evaluate a two-class performances  
                          ## (ROC, sensitivity, specificity) using the following function 
                          summaryFunction = twoClassSummary)

### train the models

# Use the expand.grid to specify the search space   
glmBoostGrid1 = expand.grid(mstop = c(50, 100, 150),
                           prune = c("yes", "no"))

set.seed(4242)
glmBoostFit1 = train(Class ~ ., 
                    data = training,
                    method = "glmboost",
                    trControl = fitControl,
                    tuneGrid = glmBoostGrid1,
                    metric = "ROC")

print(glmBoostFit1)


glmBoostGrid2 = expand.grid(mstop = c(50, 100, 150),
                           prune = c("no", "yes"))

set.seed(4242)
glmBoostFit2 = train(Class ~ ., 
                     data = training,
                     method = "glmboost",
                     trControl = fitControl,
                     tuneGrid = glmBoostGrid2,
                     metric = "ROC")


print(glmBoostFit2)

Session Info:

>sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

locale:
[1] LC_COLLATE=Dutch_Netherlands.1252  LC_CTYPE=Dutch_Netherlands.1252    LC_MONETARY=Dutch_Netherlands.1252 LC_NUMERIC=C                      
[5] LC_TIME=Dutch_Netherlands.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mboost_2.6-0    stabs_0.5-1     mlbench_2.1-1   caret_6.0-64    ggplot2_2.1.0   lattice_0.20-33

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.3        compiler_3.2.3     nloptr_1.0.4       plyr_1.8.3         iterators_1.0.8    tools_3.2.3        lme4_1.1-11       
 [8] nlme_3.1-126       gtable_0.2.0       mgcv_1.8-12        Matrix_1.2-4       foreach_1.4.3      SparseM_1.7        mvtnorm_1.0-5     
[15] coin_1.1-2         stringr_1.0.0      pROC_1.8           MatrixModels_0.4-1 stats4_3.2.3       grid_3.2.3         nnet_7.3-12       
[22] survival_2.38-3    multcomp_1.4-4     TH.data_1.0-7      minqa_1.2.4        reshape2_1.4.1     car_2.1-1          magrittr_1.5      
[29] scales_0.4.0       codetools_0.2-14   modeltools_0.2-21  MASS_7.3-45        splines_3.2.3      nnls_1.4           pbkrtest_0.4-6    
[36] strucchange_1.5-1  colorspace_1.2-6   quantreg_5.21      quadprog_1.5-5     sandwich_2.3-4     stringi_1.0-1      party_1.0-25      
[43] munsell_0.4.3      zoo_1.7-12

The text was updated successfully, but these errors were encountered:

topepo · 2016-03-30T01:43:22Z

It isn't alarming that the verbose logging didn't print out every model; use use something called the sub-model trick here. If you want to evaluate models with 10, 50, and 100 boosting iterations we don't have to fit all three models (only the last one). This can save a ton of time tuning the model.

Also:

the default tuning grid fixes prune = "no". You have the option to tune it (as you showed) but it defaults to a single value
it is very possible that you might get the exact same answers with and without pruning. If the AIC statistic says that the largest number of iterations is optimal (by that metric) then pruning doesn't eliminate any iterations.

However, there was a bug here. The mboost package does something very atypical for R by changing the object in memory without the object being re-assigned. For example:

> cars.gb <- glmboost(dist ~ speed, data = cars,
+                     control = boost_control(mstop = 2000),
+                     center = FALSE)
> cars.gb

     Generalized Linear Models Fitted via Gradient Boosting

Call:
glmboost.formula(formula = dist ~ speed, data = cars, center = FALSE,     control = boost_control(mstop = 2000))


     Squared Error (Regression) 

Loss function: (y - f)^2 


Number of boosting iterations: mstop = 2000 
Step size:  0.1 
Offset:  42.98 

Coefficients: 
(Intercept)       speed 
 -60.331204    3.918359 
attr(,"offset")
[1] 42.98

> 
> ### initial number of boosting iterations
> mstop(cars.gb)
[1] 2000
> ### look at the model after only 10 iterations:
> cars.gb[10]

     Generalized Linear Models Fitted via Gradient Boosting

Call:
glmboost.formula(formula = dist ~ speed, data = cars, center = FALSE,     control = boost_control(mstop = 2000))


     Squared Error (Regression) 

Loss function: (y - f)^2 


Number of boosting iterations: mstop = 10 
Step size:  0.1 
Offset:  42.98 

Coefficients: 
(Intercept)       speed 
 -0.6546347   0.2338597 
attr(,"offset")
[1] 42.98

> 

> mstop(cars.gb)
[1] 10

This is documented in ?mstop (although I didn't see it originally)

The [.mboost function can be used to enhance or restrict a given boosting model to the specified boosting iteration i. Note that in both cases the original x will be changed to reduce the memory footprint. If the boosting model is enhanced by specifying an index that is larger than the initial stop, only the missing i - stop steps are fitted. If the model is restricted, the spare steps are not dropped, i.e., if we increase i again, these boosting steps are immediately available. Alternatively, the same operation can be done by mstop(x) <- i.

I made changes to this and gamboost so that they work properly. You can get the model code listed in the changes above or wait until it is on CRAN (I think in about 2 weeks)

topepo added a commit that referenced this issue Mar 30, 2016

bug fixes related to issue #396

1b107a7

topepo closed this as completed Mar 30, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

glmboost issue with verboseIter and parameters #396

glmboost issue with verboseIter and parameters #396

pverspeelt commented Mar 26, 2016

topepo commented Mar 30, 2016

glmboost issue with verboseIter and parameters #396

glmboost issue with verboseIter and parameters #396

Comments

pverspeelt commented Mar 26, 2016

Minimal, reproducible example:

Session Info:

topepo commented Mar 30, 2016