Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFE crashes when it gets down to one variable #543

Closed
sco-lo-digital opened this issue Dec 2, 2016 · 3 comments
Closed

RFE crashes when it gets down to one variable #543

sco-lo-digital opened this issue Dec 2, 2016 · 3 comments

Comments

@sco-lo-digital
Copy link

Hi Max

If I run the below, it works.

library(mlbench)
library(caret)
data("PimaIndiansDiabetes")
ctrl <- rfeControl(
  functions = lrFuncs,
  #functions = rfFuncs,
  method = 'boot',
  number = 5,
  rerank = TRUE,
  verbose = TRUE
)

varSeq <- c(2:4)
set.seed(721)
lrRFE <- rfe(PimaIndiansDiabetes[,-9],
             PimaIndiansDiabetes[,9],
             sizes = varSeq,
             metric = "ROC",
             rfeControl = ctrl)
lrRFE

But if I reduce the lowest number in sizes, ie varSeq, to 1, it crashes causing the error:

"Error in { : task 1 failed - "replacement has 1 row, data has 0"
In addition: Warning message:
In rfe.default(PimaIndiansDiabetes[, -9], PimaIndiansDiabetes[, :
Metric 'ROC' is not created by the summary function; 'Accuracy' will be used instead"

I've identified that the use of rerank = TRUE is causing the error. If I remove that, it works. But I sort of want rerank on all the other combos of variables, so I'd like to be able to run it with rerank = TRUE. Any ideas? Have I missed something?

Thanks for caret!
Scott

@hadjipantelis
Copy link
Contributor

Hello Scott,

I cannot reproduce the error you are getting. Which version of caret are you using? Can you please append your sessionInfo() output. In addition, if you install the github version, do you still get this error? (I get the warning but not the error.)

Pantelis

@topepo
Copy link
Owner

topepo commented Dec 8, 2016

It is a bug in the glm variable importance code and not in rfe:

> mod1 <- glm(diabetes ~ pregnant + glucose, data = PimaIndiansDiabetes, 
+             family = binomial())
> mod2 <- glm(diabetes ~ glucose, data = PimaIndiansDiabetes, 
+             family = binomial())
> varImp(mod1)
           Overall
pregnant  4.817846
glucose  11.321631
> varImp(mod2)
   Overall
1 11.64695

Damn you drop = TRUE!!!

I'll fix the method, but in the meantime, you can use:

lrFuncs2 <- lrFuncs
lrFuncs2$rank <- function (object, x, y) 
{
  values <- summary(object)$coef
  varImps <-  abs(values[-1, grep("value$", colnames(values)), drop = FALSE])
  vimp <- data.frame(varImps)
  colnames(vimp) <- "Overall"
  if(!is.null(names(varImps))) rownames(vimp) <- names(varImps)
  
  vimp <- vimp[order(vimp$Overall, decreasing = TRUE), , drop = FALSE]
  vimp$var <- rownames(vimp)
  vimp
}


ctrl <- rfeControl(
  functions = lrFuncs2,
  method = 'boot',
  number = 5,
  rerank = TRUE,
  verbose = TRUE
)

varSeq <- c(1:4)
set.seed(721)
lrRFE <- rfe(PimaIndiansDiabetes[,-9],
             PimaIndiansDiabetes[,9],
             sizes = varSeq,
             metric = "ROC",
             rfeControl = ctrl)
lrRFE

topepo pushed a commit that referenced this issue Dec 8, 2016
@sco-lo-digital
Copy link
Author

Thank you both and sorry for the late reply!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants