Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving the confusionMatrix function #136

Merged
merged 10 commits into from
Apr 13, 2015
Merged

Conversation

jknowles
Copy link
Contributor

@jknowles jknowles commented Apr 6, 2015

One issue I have had with the confusionMatrix command is that if the factor levels are identical, but out of order, the function fails. The following MWE gives an example of the problem.

set.seed(442)
library(caret)
train <- twoClassSim(n = 1000, intercept = -8, linearVars = 3, 
                        noiseVars = 10, corrVars = 4, corrValue = 0.6)

ctrl <- trainControl(method = "cv", classProbs = TRUE, 
                     summaryFunction = twoClassSummary)

fullModel <- train(Class ~ ., data = train, 
                   method = "knn", 
                   preProc = c("center", "scale"), 
                   tuneLength = 8, 
                   metric = "ROC", 
                   trControl = ctrl)

dat <- train$Class
ref <- predict(fullModel)
cm1 <- confusionMatrix(dat, ref)
dat <- as.character(dat)
ref <- as.character(ref)
dat <- factor(dat, levels = c("Class2", "Class1"))
ref <- factor(ref, levels = c("Class1", "Class2"))
cm2 <- confusionMatrix(dat, ref)

There is no reason that the function cannot recognize the levels being the same and reorder them. This patch does that, and issues a warning to the user letting them know that the reference has been reordered. Alternatively, it could be modified so that the data is reordered and a warning is issued. In any case, the new function will produce a confusion matrix that is identical in metrics to the correct matrix above, but with the table inverted.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.0%) to 2.42% when pulling ae6d8ac on jknowles:confmatrixTweak into d3b6bfd on topepo:master.

@zachmayer
Copy link
Collaborator

You should add a test for when the levels are the same and for when they are different.

Other than that, it looks like a good idea to me!


Sent from Mailbox

On Mon, Apr 6, 2015 at 6:30 AM, Coveralls notifications@github.com
wrote:

Coverage Status

Coverage decreased (-0.0%) to 2.42% when pulling ae6d8ac on jknowles:confmatrixTweak into d3b6bfd on topepo:master.

Reply to this email directly or view it on GitHub:
#136 (comment)

@jknowles
Copy link
Contributor Author

jknowles commented Apr 6, 2015

Test added. It also makes sure that if the levels are different, a warning is issued, but the results are the same as if they were not out of order.

@jknowles
Copy link
Contributor Author

Actually, on second thought, don't merge this quite yet. This fails if the data has fewer levels than the reference class (i.e. the predictions coming in are only of 1 class, all predicted to be the prevalent class). While this is unhelpful, it should still make a confusion matrix -- no reason not to. I'll work on how to normalize this next.

@jknowles
Copy link
Contributor Author

Not sure why this is happening now:

ERROR: dependency ‘BradleyTerry2’ is not available for package ‘caret’

@topepo
Copy link
Owner

topepo commented Apr 13, 2015

Some cases to consider:

  • If the factor vector of predictions has a subset of factor levels of the reference factor it should be re-factored with the same levels in the same order.
  • If the factor vector of predictions has levels not found in the reference vector, an error should be thrown.

Not sure what the story is with BradleyTerry2. Try installing from source.

@jknowles
Copy link
Contributor Author

@topepo Yes, I just encountered both of those issues in my own testing -- I will add them to the function and add unit tests to ensure they function correctly.

The BradleyTerry2 thing is happening on Travis, not for my local build.

@topepo
Copy link
Owner

topepo commented Apr 13, 2015

Thanks,

Max

@topepo topepo closed this Apr 13, 2015
@topepo topepo reopened this Apr 13, 2015
topepo added a commit that referenced this pull request Apr 13, 2015
Improving the confusionMatrix function
@topepo topepo merged commit 0aee6f6 into topepo:master Apr 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants