Wrong OOB estimates with cforest method #351

asardaes · 2016-01-13T12:56:53Z

This is a tricky one, and it took me a while to figure it out.

When using "oob" as training method, the default function to obtain Accuracy and Kappa is the following:

obs <- x@data@get("response")[,1]
pred <- predict(x,  x@data@get("input"), OOB = TRUE)
postResample(pred, obs)

The second argument to predict is the newdata parameter. This should be ok, since x@data@get("input") just loads the input data. However, looking at the source code from the party package, I realized that the call to the actual prediction function (which is a C function) does the boolean operation OOB && is.null(newdata) to determine which output to give. Therefore, the only way to get the true OOB estimate is by calling the predict generic with newdata = NULL, which is the default when calling the function as predict(x, OOB = TRUE)...

I realized I was getting OOB Accuracy estimates that were much larger than the one for finalModel, and this is the reason.

Proof:

library(party)

data(mtcars)

cforest_fit <- cforest(mpg ~ ., data = mtcars, controls = cforest_unbiased(mtry = 0))

# True OOB prediction
pred1 <- predict(cforest_fit, OOB = TRUE)

# This should be equal to 'pred1'...
pred2 <- predict(cforest_fit, newdata = cforest_fit@data@get("input"), OOB = TRUE)

# Prediction without using OOB
pred3 <- predict(cforest_fit, OOB = FALSE)

# This is FALSE
identical(pred1, pred2)

# This is actually TRUE, i.e. 'pred2' sort of ignores OOB = TRUE
identical(pred2, pred3)

The text was updated successfully, but these errors were encountered:

Sandy4321 · 2016-01-13T13:15:57Z

so all we mistakenly used it ?

On Wed, Jan 13, 2016 at 7:56 AM, Alexis Sarda notifications@github.com
wrote:

This is a tricky one, and it took me a while to figure it out

When using "oob" as training method, the default function to obtain
Accuracy and Kappa is the following:

obs <- x@data@get("response")[,1]pred <- predict(x, x@data@get("input"), OOB = TRUE)
postResample(pred, obs)

The second argument to predict is the newdata parameter This should be
ok, since x@data@get("input") just loads the input data However, looking
at the source code from the party package, I realized that, the call to
the actual prediction function (which is a C function) does the boolean
operation OOB & isnull(newdata) to determine which output to give
Therefore, the only way to get the true OOB estimate is by calling the
predict generic with newdata = NULL, which is the default when calling
the function as predict(x, OOB = TRUE)

I realized I was getting OOB Accuracy estimates that were much larger than
the one for finalModel, and this is the reason

—
Reply to this email directly or view it on GitHub
#351.

topepo · 2016-01-13T14:17:49Z

Wow, nice catch.

I'll remove the second argument. In the meantime, you can just redefine the oob module to get the correct results.

topepo · 2016-01-13T14:28:07Z

Please test when you have the time

asardaes · 2016-01-13T14:44:14Z

Yes, that fixes it for me at least

asardaes · 2016-01-18T12:35:29Z

I should also point out that the predict generic for randomForest has a similar behavior.

I see you've taken that into account during training, but be aware that if you use train with rf, predict(train_rf) != predict(train_rf$finalModel), which may or may not be desired.

topepo added a commit that referenced this issue Jan 13, 2016

change in the cforest oob measure for issue #351

0734afc

asardaes closed this as completed Jan 13, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong OOB estimates with cforest method #351

Wrong OOB estimates with cforest method #351

asardaes commented Jan 13, 2016

Sandy4321 commented Jan 13, 2016

topepo commented Jan 13, 2016

topepo commented Jan 13, 2016

asardaes commented Jan 13, 2016

asardaes commented Jan 18, 2016

Wrong OOB estimates with cforest method #351

Wrong OOB estimates with cforest method #351

Comments

asardaes commented Jan 13, 2016

Sandy4321 commented Jan 13, 2016

topepo commented Jan 13, 2016

topepo commented Jan 13, 2016

asardaes commented Jan 13, 2016

asardaes commented Jan 18, 2016