Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

varImp includes regressors that are explicitly excluded from model #493

Closed
crossxwill opened this issue Sep 22, 2016 · 3 comments
Closed

Comments

@crossxwill
Copy link

For a train object with the glmboost method, the varImp results include regressors that were not considered in the training process.

> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server release 6.8 (Santiago)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_3.3.0

Some fake data for reproducible example

###### fake data ------

set.seed(123)

x1 = rnorm(100)
x2 = rnorm(100)
x3 = rnorm(100)
x4 = rnorm(100)

set.seed(999)

y = 50 + 5*x1 + 5*x2 + 10*x3 + rnorm(100)

df = data.frame(y=y, x1=x1, x2=x2, x3=x3, x4=x4)

Some caret code with glmboost

##### glmboost ----
library(caret)

searchGrid <- data.frame(mstop=seq(1,50, by=1), prune=rep('no', times=50))

set.seed(321)
glmboost.mod <- train(y ~ x1, data=df, method='glmboost', tuneGrid=searchGrid)

coef(glmboost.mod$finalModel)

varImp(glmboost.mod)

plot(glmboost.mod)

Final model chose x1 (as expected), but the varImp results shows x4, x3, and x2, which should be excluded.

> coef(glmboost.mod$finalModel)
(Intercept)          x1 
  -0.192532    2.129640 
attr(,"offset")
[1] 51.01175
> 
> varImp(glmboost.mod)
loess r-squared variable importance

   Overall
x3 100.000
x2  28.308
x1   5.913
x4   0.000
@topepo
Copy link
Owner

topepo commented Oct 20, 2016

It is not obvious but it says that "loess r-squared variable importance". This means that there is no implemented code for estimating the importance for this model (so it defaults to a univariate assessment that is not related to boosted GLMs).

I just added a method to the model code that can be used.

@crossxwill
Copy link
Author

Thank you. Might be a good idea to put "Univariate r-squared (loess)".

On Oct 20, 2016 9:56 AM, "Max Kuhn" notifications@github.com wrote:

It is not obvious but it says that "loess r-squared variable importance".
This means that there is no implemented code for estimating the importance
for this model (so it defaults to a univariate assessment that is not
related to boosted GLMs).

I just added a method to the model code that can be used.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#493 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AD8mYrhJ5sH2veC2LBXG1qYJL-TZja2vks5q150ygaJpZM4KDgNT
.

@topepo
Copy link
Owner

topepo commented Nov 2, 2016

Good idea. I made a change to the labels. Thanks

@topepo topepo closed this as completed Nov 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants