Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROC < .5 in filterVarImp #565

Closed
topepo opened this issue Jan 5, 2017 · 0 comments
Closed

ROC < .5 in filterVarImp #565

topepo opened this issue Jan 5, 2017 · 0 comments

Comments

@topepo
Copy link
Owner

topepo commented Jan 5, 2017

There is no way to know the direction that the prediction should be used. An ROC AUC of 0.3 is indicative of an important predictor (but for the non-reference class). It should always be greater than 0.50 even though there are some non-informative predictors that will have values slightly less than 0.5.

> library(caret)
> 
> set.seed(1)
> dat <- twoClassSim(100)
> dat$Class <- relevel(dat$Class, ref = "Class2")
> filterVarImp(dat[, -ncol(dat)], dat$Class)
              Class2    Class1
TwoFactor1 0.4848485 0.4848485
TwoFactor2 0.6506239 0.6506239
Linear01   0.4585561 0.4585561
Linear02   0.2838681 0.2838681
Linear03   0.7237077 0.7237077
Linear04   0.4425134 0.4425134
Linear05   0.5922460 0.5922460
Linear06   0.4295900 0.4295900
Linear07   0.4821747 0.4821747
Linear08   0.3801248 0.3801248
Linear09   0.5436720 0.5436720
Linear10   0.4073084 0.4073084
Nonlinear1 0.5597148 0.5597148
Nonlinear2 0.4416221 0.4416221
Nonlinear3 0.5405526 0.5405526

Looking, for example, at Linear02:

ggplot(dat, aes(x = Class, y = Linear02)) + geom_boxplot()

There is a significant shift between the classes (as was designed in the simulation).

> sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: macOS Sierra 10.12.2

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] caret_6.0-73    ggplot2_2.2.0   lattice_0.20-34

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.8        magrittr_1.5       splines_3.3.2      MASS_7.3-45        munsell_0.4.3     
 [6] colorspace_1.3-1   foreach_1.4.3      minqa_1.2.4        stringr_1.1.0      car_2.1-4         
[11] plyr_1.8.4         tools_3.3.2        parallel_3.3.2     nnet_7.3-12        pbkrtest_0.4-6    
[16] grid_3.3.2         gtable_0.2.0       nlme_3.1-128       mgcv_1.8-15        quantreg_5.29     
[21] e1071_1.6-7        class_7.3-14       MatrixModels_0.4-1 iterators_1.0.8    lme4_1.1-12       
[26] lazyeval_0.2.0     assertthat_0.1     tibble_1.2         Matrix_1.2-7.1     nloptr_1.0.4      
[31] reshape2_1.4.2     ModelMetrics_1.1.0 codetools_0.2-15   stringi_1.1.2      scales_0.4.1      
[36] stats4_3.3.2       SparseM_1.74    
topepo pushed a commit that referenced this issue Jan 13, 2017
@topepo topepo closed this as completed Jan 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant