Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

`predict` fails if formula method is used with deleted variables #574

Closed
adatum opened this issue Jan 13, 2017 · 3 comments
Closed

`predict` fails if formula method is used with deleted variables #574

adatum opened this issue Jan 13, 2017 · 3 comments

Comments

@adatum
Copy link
Contributor

@adatum adatum commented Jan 13, 2017

In a multi-class classification problem, trying to get the class probabilities from a model object trained with the formula method with deleted variables (i.e. -variable) fails with predict when newdata is not specified.

Reproducible example:

fit.lda <- train(Species ~ . -Petal.Length, 
  data = iris, 
  preProcess = c("center", "scale"), 
  trControl = trainControl(method = "repeatedcv", 
    number = 10, 
    repeats = 3, 
    classProbs = TRUE, 
    savePredictions = "final", 
    selectionFunction = "best", 
    summaryFunction = multiClassSummary), 
  method = "lda", 
  metric = "Mean_F1")

then

predict(fit.lda, type = "prob")

fails with:

Error in predict.lda(modelFit, newdata) : wrong number of variables

If the -Petal.Length is omitted in the train formula, there is no error.

This error occurs for some models (lda, knn, svmRadial, mlpML) but not others (nnet, rf, xgbTree).

=====

Workaround

If newdata is specified with the training data, it seems to succeed in retrieving the class probabilities. Is this the correct way of doing it?

In the documentation for predict:

newdata: an optional set of data to predict on. If NULL, then the original training data are used

In the past, I always omitted newdata for obtaining information from the trained model, and included the parameter when using on new testing data.

@adatum
Copy link
Contributor Author

@adatum adatum commented Jan 18, 2017

I'm not sure the workaround is correct. According to http://stats.stackexchange.com/a/112052/130941 predict might behave differently depending on whether newdata is specified or not.

@topepo
Copy link
Owner

@topepo topepo commented Mar 15, 2017

The issue is that an extra column is coming along for the ride (Petal.Length) and some model functions are tolerant to that and others are not.

I'll make a fix for the issue.

topepo added a commit that referenced this issue Mar 24, 2017
@topepo
Copy link
Owner

@topepo topepo commented Apr 13, 2017

This should work find in the devel version and also the one that I just sent to CRAN.

@topepo topepo closed this Apr 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.